[R] Bubble chart

2012-03-21 Thread ens
I have a matrix of p-values for for each explanatory variable. Each row is an
area of the response variable and each column is an explanatory variable.

e.g. 
   PSA   pval_DOY pval_PDSIconcurrent pval_PDSIantecedent_annual_average
pval_TMAXanomaly pval_FM100anomaly
1 NC06 0.96747495   0.6092668 0.53353019   
0.93011150.99801334
2 NC04 0.04699659   0.2759152 0.07024752   
0.60468280.03224094
3 NC01 0.71437394   0.9979173 0.85296024   
0.99775580.99833623
4 NC08 0.67315904   0.9970511 0.51756714   
0.78099940.99626038
5 NC07 0.55221280   0.5784208 0.43975219   
0.36694910.34898877
6 NC05 0.52089881   0.7191645 0.91972153   
0.44874600.94922430

I want to create a visual display such that instead of #s for the p-values,
I have a circle sized to represent the size of the p-value.

symbolys() is what I've found to do this, but it isn't clear to me how to
format the input for the function.

Any suggestions? or help? Even a link if someone else has asked the question
and it's been answered would be helpful.

--
View this message in context: 
http://r.789695.n4.nabble.com/Bubble-chart-tp4494539p4494539.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automaticall adjust axis scales

2012-03-21 Thread Jim Lemon

On 03/22/2012 05:42 AM, Alaios wrote:

Would it be possible to change the axis at the end?
My data sets seem to be quite large so I was thinking for the plot and
the consequent lines to keep always
the current minimum and maximum like
plot(x)
minimum=min(x)
lines(x1)
minimum=c(minimum,x1)
lines(x2)
minimum=c(minimum,x2)
then if there is a good way for altering axis lines I could put the
limits I want to.


Hi Alex,
One way or another you are going to have to set the limits in your first 
call to "plot". You could do the plot first and collect the limits along 
the way, then do it again, but I can't see that this has any advantage 
over calculating the limits for the entire set of data that is to be 
plotted at the beginning and then doing the plots. If you want to get 
the limits for each data set separately, maybe:


range1<-range(x1)
range2<-range(x2)
range3<-range(x3)
ylim<-range(c(range1,range2,range3))

However, I suspect that this is of more use in helping you understand 
what is happening than actually getting it done.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Any package recommended for time series graphics in R?

2012-03-21 Thread jpm miao
Hello,

   I would like to produce a few time series graphs with R. My data is
usually quarterly or monthly. I sometimes need to have two y-axes, one at
the left, the other right. Is there any toolbox producing nice looking time
series graphs?

   Thanks,

miao

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] calling java from R and using java time series double precision array

2012-03-21 Thread Hurr
I haven't had time to try using R for over a year, but have a colleage who
wants to.
We work with time series and our current version of our calendar-time
subroutines in 
java converts both directions between linear time and calendar. 
We have used calendar time since year 1965 starting out then with Fortran. 
Calendar time can be CnYrMoDa | CnYrMoDaHr | CnYrMoDaHrMn | CnYrMoDaHrMnSc 
 | CnYrMoDaHrMnScD | CnYrMoDaHrMnScDC | CnYrMoDaHrMnScDCM |
CnYrMoDaHrMnScDCMQ.
We use calendar time in string format and linear time numbers in double
precision. 
Thus we can get to calendar quadiseconds (tenths of milliseconds), 
and we must always have thoroughly tested subroutines.
Unit of linear time can be converted to: da | hr | mn | sc | ds | cs | ms |
qs . 
Limiting this discussion to reading time series data into R, 
I believe I want to have R call a java method that reads a time series into 
a two dimensional array with the first column being the linear time and 
the other columns being the various time series data (functions) at those
times. 
I am not sure that the double precision java array can be used by R. 
And I would appreciate any other comments about this kind of interface with
java. 


--
View this message in context: 
http://r.789695.n4.nabble.com/calling-java-from-R-and-using-java-time-series-double-precision-array-tp4494581p4494581.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read wiff extension files into R

2012-03-21 Thread Suyan Tian
Does anyone know how to read or import a wiff (AB SCIEX windows interchange 
file format) extension into R? Thanks a lot. 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqrt(-x) vs. -x^0.5

2012-03-21 Thread cberry
Mike Williamson  writes:

> Thanks Sarah, All,
>
> I guess I never thought of a negative sign as an "operation", but
> knowing that it is considered an operation explains everything nicely.
>  Somewhere in it's underbelly, I suppose -4 is represented as "0 - 4"?

Not exactly. Here is the list of constituent elements of expressions for -1 and 
0 - 1

> as.list(quote(-4))
[[1]]
`-`

[[2]]
[1] 4

> as.list(quote(0-4))
[[1]]
`-`

[[2]]
[1] 0

[[3]]
[1] 4

The first case is unary, the second is binary. 

Exercise: See ?Syntax. Write the result of this line:

  lapply( as.list( quote(-1-4) ), as.list )

and (thereby) explain why the result of -1-4 isn't 3.

HTH,

Chuck

[rest deleted]


-- 
Charles C. BerryDept of Family/Preventive Medicine
cberry at ucsd edu  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Summary values from Glm function (rms package)

2012-03-21 Thread Yonghao
Dear fellow R-users,

I’m using the Glm function (gamma family of distributions) from the rms
package to compare 2 groups on costs data. Although the summary function
does provide the mean cost difference and  standard errors, I believe these
values were in the (natural) log ratio format. Is there a function to
express these values into the original scale of the response variable (i.e.,
dollars) such that I could obtain the mean adjusted cost difference (and the
95%CI)?

Many thanks to everyone in advance!

YH 


--
View this message in context: 
http://r.789695.n4.nabble.com/Summary-values-from-Glm-function-rms-package-tp4494458p4494458.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predict () for LDA and GLM

2012-03-21 Thread palanski
Hi!

I'm using GLM, LDA and NaiveBayes for binomial classification. My training
set is 70 rows long with 32 features, and my test set is 30 rows long with
32 features.

Using Naive Bayes, I can train a model, and then predict the test set with
it like so:

ass4q1.dLDA = lda(ass4q1.trainSet[,1]~ass4q1.trainSet[,2:3])
table(predict(ass4q1.dNB, ass4q1.testSetDF[,2:3]), ass4q1.testSetDF[,1])


However, when the same is done for LDA or GLM, suddenly it tells me that the
number of rows doesn't match and doesn't predict my test data. The error for
GLM, as an example, is:

Error in table(predict(ass4q1.dGLM, ass4q1.testSetDF[, 2:3]),
ass4q1.testSetDF[,  : 
  all arguments must have the same length
In addition: Warning message:
'newdata' had 30 rows but variable(s) found have 70 rows 

What am I missing?


--
View this message in context: 
http://r.789695.n4.nabble.com/predict-for-LDA-and-GLM-tp4494381p4494381.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Macro or Loop info/help needed

2012-03-21 Thread Jeff

   I'm very new to R having recently made the transition from SPSS and SAS.
   In  a  dataset named t4, I have about 20 variables that are named in a
   somewhat chronological order - (e.g., q100ayr, q101ayr, q102ayr, q103ayr,
   etc.)
   Each variable contains a 2 digit year designation (e.g., 73, 74,75, 76,
   etc), which corresponds to the year something occurred (e.g., 73=1973).
   Each of the 20 variables is similarly designated, but corresponds to a
   different type of event/occurrence.
   I need to reorder the data in this fashion:
   # first event
   t4 $ eventX1973 <- 0
   t4$ eventX1973[t4 $ q100ayr== 73 ] <- 1
   t4 $ eventX1974 <- 0
   t4$ eventX1974[t4 $ q100ayr== 74 ] <- 1
   t4 $ eventX1975 <- 0
   t4$ eventX1975[t4 $ q100ayr== 75 ] <- 1
   etc. for years between 73 and 88
   # second event
   t4 $ eventZ1973 <- 0
   t4$ eventZ1973[t4 $ q101ayr== 73 ] <- 1
   t4 $ eventZ1974 <- 0
   t4$ eventZ1974[t4 $ q101ayr== 74 ] <- 1
   t4 $ eventZ1975 <- 0
   t4$ eventZ1975[t4 $ q101ayr== 75 ] <- 1
   etc.
   The code above will work, but would be many lines long. I'm hoping someone
   can give me a quick introduction to what would work best in R. I assume some
   type of macro or loop.
   Thanks in advance.
   Jeff
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reshape from long to wide

2012-03-21 Thread aly

Thanks a lot,

I tried one of the ways you guys showed me and it totally work. Just for
fun, I tried all the others and with some modifications here and there they
work fine too. It was time consuming but definitely worth as a good learning
experience. Thanks again

--
View this message in context: 
http://r.789695.n4.nabble.com/Reshape-from-long-to-wide-tp4486875p4494055.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] To overlay my raster and its boundary

2012-03-21 Thread Komine
Hi
I want to overlay my raster and its boundary which is a shapefile.
 When I used thise code separately, all is ok:
# Open raster 
>Image<-read.table("C:\\Users\\Documents\\Nouveau\\Frequence.txt",sep="",dec=",",header=TRUE)
>
 >testo<-rasterFromXYZ(Image)   
>plot(testo) 
>testo2 <- aggregate(testo,fact=10, fun=mean) 
>plot(testo2) 
# open shapefiles= boundary 
>library (rgdal) 
>test1<-readOGR(dsn="C:\\Users\\Documents\\DISC D\\Nouveau",layer="pays") 
>plot(test1) 

But How to overlay both layers (raster and shapefile) to have a same map?.
Thank you in advance 

--
View this message in context: 
http://r.789695.n4.nabble.com/To-overlay-my-raster-and-its-boundary-tp4493705p4493705.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Summary values from Glm function (rms package)

2012-03-21 Thread Yonghao
Dear fellow R-users,

I’m using the Glm function (gamma family of distributions) from the rms
package to compare 2 groups on costs data. Although the summary function
does provide the mean cost difference and  standard errors, I believe these
values were in the (natural) log ratio format. Is there a function to
express these values into the original scale of the response variable (i.e.,
dollars) such that I could obtain the mean adjusted cost difference (and the
95%CI)?

Many thanks to everyone in advance!

YH 


--
View this message in context: 
http://r.789695.n4.nabble.com/Summary-values-from-Glm-function-rms-package-tp4494456p4494456.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqrt(-x) vs. -x^0.5

2012-03-21 Thread Bert Gunter
On Wed, Mar 21, 2012 at 5:21 PM, Mike Williamson  wrote:
> Thanks Sarah, All,
>
>    I guess I never thought of a negative sign as an "operation", but
> knowing that it is considered an operation explains everything nicely.
>  Somewhere in it's underbelly, I suppose -4 is represented as "0 - 4"?

Sheesh! And why do you suppose that?! Read up on floating point
arithmetic to find out how it is done.

-- Bert

>  Either way, I'm glad it is consistent & accurate, so that I didn't find
> myself in another pickle like "weak" typing and attempts to use time /date
> classes in 'R' have brought me.
>
>                               Thanks!
>                                      Mike
>
>
> ---
> [The theory of gravity] is to me so great an absurdity that I believe no
> Man who has in philosophical matters a competent faculty of thinking can
> ever fall into it.  -- Isaac Newton
>
>
>
> On Wed, Mar 21, 2012 at 4:37 PM, Rolf Turner  wrote:
>
>> On 22/03/12 12:17, Sarah Goslee wrote:
>>
>>> It's order of operations, and a good reason to always use
>>> parentheses: which is evaluated first, the unary minus or
>>> the raising-to-powers?
>>>
>>> (-4)^0.5
>>> -(4^0.5)
>>>
>>> sqrt(-4)
>>> -sqrt(4)
>>>
>>
>> If the OP *really* wants the square root of -4 he could do
>> sqrt(-4+0i) or (-4+0i)^0.5 (and get 0+2i in either case).
>>
>>    cheers,
>>
>>        Rolf Turner
>>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqrt(-x) vs. -x^0.5

2012-03-21 Thread Mike Williamson
Thanks Sarah, All,

I guess I never thought of a negative sign as an "operation", but
knowing that it is considered an operation explains everything nicely.
 Somewhere in it's underbelly, I suppose -4 is represented as "0 - 4"?
 Either way, I'm glad it is consistent & accurate, so that I didn't find
myself in another pickle like "weak" typing and attempts to use time /date
classes in 'R' have brought me.

   Thanks!
  Mike


---
[The theory of gravity] is to me so great an absurdity that I believe no
Man who has in philosophical matters a competent faculty of thinking can
ever fall into it.  -- Isaac Newton



On Wed, Mar 21, 2012 at 4:37 PM, Rolf Turner  wrote:

> On 22/03/12 12:17, Sarah Goslee wrote:
>
>> It's order of operations, and a good reason to always use
>> parentheses: which is evaluated first, the unary minus or
>> the raising-to-powers?
>>
>> (-4)^0.5
>> -(4^0.5)
>>
>> sqrt(-4)
>> -sqrt(4)
>>
>
> If the OP *really* wants the square root of -4 he could do
> sqrt(-4+0i) or (-4+0i)^0.5 (and get 0+2i in either case).
>
>cheers,
>
>Rolf Turner
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqrt(-x) vs. -x^0.5

2012-03-21 Thread Rolf Turner

On 22/03/12 12:17, Sarah Goslee wrote:

It's order of operations, and a good reason to always use
parentheses: which is evaluated first, the unary minus or
the raising-to-powers?

(-4)^0.5
-(4^0.5)

sqrt(-4)
-sqrt(4)


If the OP *really* wants the square root of -4 he could do
sqrt(-4+0i) or (-4+0i)^0.5 (and get 0+2i in either case).

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqrt(-x) vs. -x^0.5

2012-03-21 Thread Sarah Goslee
It's order of operations, and a good reason to always use
parentheses: which is evaluated first, the unary minus or
the raising-to-powers?

(-4)^0.5
-(4^0.5)

sqrt(-4)
-sqrt(4)

Sarah

On Wed, Mar 21, 2012 at 7:09 PM, Mike Williamson  wrote:
> Hi Everyone,
>
>    I did a search through the archives and did not find an answer,
> although I must admit it is a hard search to do ( ^0.5 is tough to
> explicitly search for ).
>
>    I am sure there is some mathematically accurate reason to explain the
> following, but I guess I either never learned it or have since forgotten it.
>
>    In 'R', when I type (for instance):
>
> sqrt(-4)
>
>    I get
>
> NaN
>
>    but if I type in:
>
> -4 ^ 0.5
>
>    I get
>
> -2
>
>    I presume this is on purpose and correct, but it's the first time I've
> come across any theoretical difference between ^0.5 and sqrt.  What is the
> theoretical difference / meaning between these two operations?
>
>                              Thanks!
>                                   Mike
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqrt(-x) vs. -x^0.5

2012-03-21 Thread Mike Williamson
Hi Everyone,

I did a search through the archives and did not find an answer,
although I must admit it is a hard search to do ( ^0.5 is tough to
explicitly search for ).

I am sure there is some mathematically accurate reason to explain the
following, but I guess I either never learned it or have since forgotten it.

In 'R', when I type (for instance):

sqrt(-4)

I get

NaN

but if I type in:

-4 ^ 0.5

I get

-2

I presume this is on purpose and correct, but it's the first time I've
come across any theoretical difference between ^0.5 and sqrt.  What is the
theoretical difference / meaning between these two operations?

  Thanks!
   Mike


---
[The theory of gravity] is to me so great an absurdity that I believe no
Man who has in philosophical matters a competent faculty of thinking can
ever fall into it.  -- Isaac Newton

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using extract function for dates in sqldf

2012-03-21 Thread Gabor Grothendieck
On Wed, Mar 21, 2012 at 11:31 AM,   wrote:
>
> I'm trying to use sqldf to query for the earliest date of a blood test when
> patients have had multiple tests in a given year. My query looks like this:
>
> test11 <- sqldf("select CHILD_ID, min(SAMP_DATE)
>                 from lab
>                 group by CHILD_ID
>                 having extract (year from SAMP_DATE) = 2011")
>
> SAMP_DATE has class "date." I get the error message
>
> Error in sqliteExecStatement(con, statement, bind.data) :
>  RS-DBI driver: (error in statement: near "from": syntax error)

extract is not supported by sqlite.  Check the SQLite date and time
functions link on the left side of the sqldf home page.  (The other
three databases supported by sqldf all do support extract.)

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble installing the XML package

2012-03-21 Thread Sarah Goslee
The first thing to do is update your rather old version of R. If the
2.14 binaries aren't available for your your version of Ubuntu in the
main repo, there are instructions here:
http://cran.r-project.org/bin/linux/ubuntu/

If that still doesn't work, please provide the list with details of
your system and R installation.

Sarah

On Wed, Mar 21, 2012 at 3:34 PM, Felix Jarman
 wrote:
> Hello everyone,
>
> I am probably not the only one having trouble with this package but here goes.
> I want to install XML on Ubuntu. I installed libxml2-dev and
> everything works out fine until I get the following:
>
> Error in reconcilePropertiesAndPrototype(name, slots, prototype,
> superClasses,  :
>  No definition was found for superclass "namedList" in the
> specification of class "XMLOutputStream"
> Error : unable to load R code in package 'XML'
> ERROR: lazy loading failed for package ‘XML’
>
> Googling this was no help.
>
>
> This is the entire R console output:
>
>> install.packages("XML",dependencies=TRUE,configure.args=c("library","clean"))
> Warnung in install.packages("XML", dependencies = TRUE, configure.args
> = c("library",  :
>  Argument 'lib' fehlt: nutze '/home/yeti/R/i486-pc-linux-gnu-library/2.10'
> versuche URL 'http://ftp.osuosl.org/pub/cran/src/contrib/XML_3.9-4.tar.gz'
> Content type 'application/x-gzip' length 923501 bytes (901 Kb)
> URL geöffnet
> ==
> downloaded 901 Kb
>
> * installing *source* package ‘XML’ ...
> configure: WARNING: you should use --build, --host, --target
> configure: WARNING: you should use --build, --host, --target
> checking for library-gcc... no
> checking for gcc... gcc
> checking for C compiler default output file name... a.out
> checking whether the C compiler works... yes
> checking whether we are cross compiling... no
> checking for suffix of executables...
> checking for suffix of object files... o
> checking whether we are using the GNU C compiler... yes
> checking whether gcc accepts -g... yes
> checking for gcc option to accept ISO C89... none needed
> checking how to run the C preprocessor... gcc -E
> No ability to remove finalizers on externalptr objects in this verison of R
> checking for sed... /bin/sed
> checking for pkg-config... /usr/bin/pkg-config
> checking for xml2-config... /usr/bin/xml2-config
> USE_XML2 = yes
> SED_EXTENDED_ARG: -E
> Minor 7, Patch 6 for 2.7.6
> Located parser file -I/usr/include/libxml2/parser.h
> Checking for 1.8:  -I/usr/include/libxml2
> Using libxml2.*
> checking for gzopen in -lz... yes
> checking for xmlParseFile in -lxml2... yes
> checking for xmlHashSize in -lxml2... yes
> Using built-in xmlHashSize
> Checking DTD parsing (presence of externalSubset)...
> checking for xmlHashSize in -lxml2... yes
> Found xmlHashSize
> checking for xmlOutputBufferCreateBuffer in -lxml2... yes
> have xmlOutputBufferCreateBuffer()
> checking for xmlDocDumpFormatMemoryEnc in -lxml2... yes
> checking libxml/xmlversion.h usability... yes
> checking libxml/xmlversion.h presence... yes
> checking for libxml/xmlversion.h... yes
> Expat:  FALSE
> Checking for return type of xmlHashScan element routine.
> No return value for xmlHashScan
> xmlNs has a context field
> Checking for cetype_t enumeration
> No cetype_t enumeration defined in R headers.
> checking for xmlsec1-config... no
> nodegc default
> Version has XML_WITH_ZLIB
> Version has xmlHasFeature()
>
> 
> Configuration information:
>
> Libxml settings
>
> libxml include directory: -I/usr/include/libxml2
> libxml library directory: -lxml2 -lz  -lxml2
> libxml 2:                 -DLIBXML2=1
>
> Compilation flags:         -DLIBXML -I/usr/include/libxml2
> -DUSE_EXTERNAL_SUBSET=1 -DROOT_HAS_DTD_NODE=1 -DDUMP_WITH_ENCODING=1
> -DUSE_XML_VERSION_H=1 -DXML_ELEMENT_ETYPE=1 -DXML_ATTRIBUTE_ATYPE=1
> -DNO_XML_HASH_SCANNER_RETURN=1 -DLIBXML_NAMESPACE_HAS_CONTEXT=1
> -DHAVE_XML_WITH_ZLIB=1 -DHAVE_XML_HAS_FEATURE=1 -DUSE_R=1 -D_R_=1
> -DHAVE_VALIDITY=1 -DXML_REF_COUNT_NODES=1
> Link flags:               -lxml2 -lz  -lxml2
>
> 
> configure: creating ./config.status
> config.status: creating src/Makevars
> config.status: creating R/supports.R
> config.status: creating inst/scripts/RSXML.csh
> config.status: creating inst/scripts/RSXML.bsh
> ** libs
> gcc -std=gnu99 -I/usr/share/R/include -DLIBXML -I/usr/include/libxml2
> -DUSE_EXTERNAL_SUBSET=1 -DROOT_HAS_DTD_NODE=1 -DDUMP_WITH_ENCODING=1
> -DUSE_XML_VERSION_H=1 -DXML_ELEMENT_ETYPE=1 -DXML_ATTRIBUTE_ATYPE=1
> -DNO_XML_HASH_SCANNER_RETURN=1 -DLIBXML_NAMESPACE_HAS_CONTEXT=1
> -DHAVE_XML_WITH_ZLIB=1 -DHAVE_XML_HAS_FEATURE=1 -DUSE_R=1 -D_R_=1
> -DHAVE_VALIDITY=1 -DXML_REF_COUNT_NODES=1  -I. -DLIBXML2=1     -fpic
> -g -O2 -c DocParse.c -o DocParse.o
> In file included from DocParse.c:13:
> Utils.h:230:2: warning: #warning "Redefining COPY_TO_USER_STRING to
> use encoding from XML parser"
> gcc -std=gnu99 -I/usr/share/R/include -DLIBXML -I/us

[R] fwdmsa package: Error in search.normal(X[samp, ], verbose = FALSE) : At least one item has no variance

2012-03-21 Thread Rob Cassidy
I'm using the fwdmsa package to identify deviant cases in a Mokken scale
analysis. I've run into a problem., separate from the one I posted
previously. The problem comes with items that are "easy" by IRT standards. A
good scale should include a range of difficulties; yet when I include "easy"
items in a forward search I continuously run into the problem that these
items demonstrate no variability in the smaller subsamples selected for
forward search. I can resolve the problem by only running the analysis
without the easier items, but this is not ideal for for the analysis. 

The data are 150 responses to a 37 item test. For the first 15 items, they
look like this (the variable names are cumbersome and so have been removed)
: 
> head(by364.data) 

111101011010111   
1 
211101111110011   
1 
301110010111011   
0 
411100111111101   
1 
511110101111011   
0 
611111111111111   
1 


You can see that items 1, 2, 9 and 13 demonstrate no variance in this small
sample and thus muck up the analysis. This happens even when I have items
with plenty  of variance, but seeing as they are easy items, they might have
a .8 - .9 probability of endorsement, so even in samples of 25-50, still run
into this variance problem.  I have seeded the forward search with specific
cases that create the necessary variance, but these artificial subsamples
tend to loaded with deviant observations and thus are excluded after a
couple of steps, creating the lack of variance problem. 

Any suggestions? 

Robert 

--
View this message in context: 
http://r.789695.n4.nabble.com/fwdmsa-package-Error-in-search-normal-X-samp-verbose-FALSE-At-least-one-item-has-no-variance-tp4493531p4493531.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] fwdmsa package: Error in search.normal(X[samp, ], verbose = FALSE) : At least one item has no variance

2012-03-21 Thread Rob Cassidy
I'm using the fwdmsa package to identify deviant cases in a Mokken scale
analysis. I've run into a problem., separate from the one I posted
previously. The problem comes with items that are "easy" by IRT standards. A
good scale should include a range of difficulties; yet when I include "easy"
items in a forward search I continuously run into the problem that these
items demonstrate no variability in the smaller subsamples selected for
forward search. I can resolve the problem by only running the analysis
without the easier items, but this is not ideal for for the analysis. 

The data are 150 responses to a 37 item test. For the first 15 items, they
look like this (the variable names are cumbersome and so have been removed)
: 
> head(by364.data) 

111101011010111   
1 
211101111110011   
1 
301110010111011   
0 
411100111111101   
1 
511110101111011   
0 
611111111111111   
1 


You can see that items 1, 2, 9 and 13 demonstrate no variance in this small
sample and thus muck up the analysis. This happens even when I have items
with plenty  of variance, but seeing as they are easy items, they might have
a .8 - .9 probability of endorsement, so even in samples of 25-50, still run
into this variance problem.  I have seeded the forward search with specific
cases that create the necessary variance, but these artificial subsamples
tend to loaded with deviant observations and thus are excluded after a
couple of steps, creating the lack of variance problem. 

Any suggestions? 

Robert 

--
View this message in context: 
http://r.789695.n4.nabble.com/fwdmsa-package-Error-in-search-normal-X-samp-verbose-FALSE-At-least-one-item-has-no-variance-tp4493530p4493530.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] small scales in fwdmsa

2012-03-21 Thread Rob Cassidy
sorry I mistyped the "fs.MSA(by364.data)" in the previous post.


--
View this message in context: 
http://r.789695.n4.nabble.com/small-scales-in-fwdmsa-tp4493479p4493490.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] small scales in fwdmsa

2012-03-21 Thread Rob Cassidy
I'm using the fwdmsa package to identify deviant cases in a Mokken scale
analysis. I've run into a problem. When I use scales comprising a few items,
iI tend to get an error:
 Error in y[order(res[-msamp])][1:(length(samp) + 1 - length(msamp))] : 
  only 0's may be mixed with negative subscripts

I understand that the error is triggered when the algorithm is fetching
cases to enter into the next step of the forward search. I don't understand
what I can do to remedy this error.

The data are dichotomized (1,0) respsonses from a multiple-choice exam that
150 students have completed. If I run the entire test (37 items) , the
fwd.msa algorithm works fine. However, the entire test is not
unidimensional, so I want to perform separate analyses with the several
unidimensional scales comprised by the entire test. Yet when I select those
4-5 item scales, I run into this error. 

Any ideas how to proceed?

The data are 150 responses to a 37 item test. For the first 15 items, they
look like this (the variable names are cumbersome and so have been removed)
:
> head(by364.data)
 
111101011010111   
1
211101111110011   
1
301110010111011   
0
411100111111101   
1
511110101111011   
0
611111111111111   
1


When I run the fwd.MSA(by364.data) on the full data set "works" fine.
However, with shorter (for reasons of unidimensionality) scales, I
continuously run into the error above. Any suggestions?

Robert


--
View this message in context: 
http://r.789695.n4.nabble.com/small-scales-in-fwdmsa-tp4493479p4493479.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Trouble installing the XML package

2012-03-21 Thread Felix Jarman
Hello everyone,

I am probably not the only one having trouble with this package but here goes.
I want to install XML on Ubuntu. I installed libxml2-dev and
everything works out fine until I get the following:

Error in reconcilePropertiesAndPrototype(name, slots, prototype,
superClasses,  :
  No definition was found for superclass "namedList" in the
specification of class "XMLOutputStream"
Error : unable to load R code in package 'XML'
ERROR: lazy loading failed for package ‘XML’

Googling this was no help.


This is the entire R console output:

> install.packages("XML",dependencies=TRUE,configure.args=c("library","clean"))
Warnung in install.packages("XML", dependencies = TRUE, configure.args
= c("library",  :
  Argument 'lib' fehlt: nutze '/home/yeti/R/i486-pc-linux-gnu-library/2.10'
versuche URL 'http://ftp.osuosl.org/pub/cran/src/contrib/XML_3.9-4.tar.gz'
Content type 'application/x-gzip' length 923501 bytes (901 Kb)
URL geöffnet
==
downloaded 901 Kb

* installing *source* package ‘XML’ ...
configure: WARNING: you should use --build, --host, --target
configure: WARNING: you should use --build, --host, --target
checking for library-gcc... no
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -E
No ability to remove finalizers on externalptr objects in this verison of R
checking for sed... /bin/sed
checking for pkg-config... /usr/bin/pkg-config
checking for xml2-config... /usr/bin/xml2-config
USE_XML2 = yes
SED_EXTENDED_ARG: -E
Minor 7, Patch 6 for 2.7.6
Located parser file -I/usr/include/libxml2/parser.h
Checking for 1.8:  -I/usr/include/libxml2
Using libxml2.*
checking for gzopen in -lz... yes
checking for xmlParseFile in -lxml2... yes
checking for xmlHashSize in -lxml2... yes
Using built-in xmlHashSize
Checking DTD parsing (presence of externalSubset)...
checking for xmlHashSize in -lxml2... yes
Found xmlHashSize
checking for xmlOutputBufferCreateBuffer in -lxml2... yes
have xmlOutputBufferCreateBuffer()
checking for xmlDocDumpFormatMemoryEnc in -lxml2... yes
checking libxml/xmlversion.h usability... yes
checking libxml/xmlversion.h presence... yes
checking for libxml/xmlversion.h... yes
Expat:  FALSE
Checking for return type of xmlHashScan element routine.
No return value for xmlHashScan
xmlNs has a context field
Checking for cetype_t enumeration
No cetype_t enumeration defined in R headers.
checking for xmlsec1-config... no
nodegc default
Version has XML_WITH_ZLIB
Version has xmlHasFeature()


Configuration information:

Libxml settings

libxml include directory: -I/usr/include/libxml2
libxml library directory: -lxml2 -lz  -lxml2
libxml 2: -DLIBXML2=1

Compilation flags: -DLIBXML -I/usr/include/libxml2
-DUSE_EXTERNAL_SUBSET=1 -DROOT_HAS_DTD_NODE=1 -DDUMP_WITH_ENCODING=1
-DUSE_XML_VERSION_H=1 -DXML_ELEMENT_ETYPE=1 -DXML_ATTRIBUTE_ATYPE=1
-DNO_XML_HASH_SCANNER_RETURN=1 -DLIBXML_NAMESPACE_HAS_CONTEXT=1
-DHAVE_XML_WITH_ZLIB=1 -DHAVE_XML_HAS_FEATURE=1 -DUSE_R=1 -D_R_=1
-DHAVE_VALIDITY=1 -DXML_REF_COUNT_NODES=1
Link flags:   -lxml2 -lz  -lxml2


configure: creating ./config.status
config.status: creating src/Makevars
config.status: creating R/supports.R
config.status: creating inst/scripts/RSXML.csh
config.status: creating inst/scripts/RSXML.bsh
** libs
gcc -std=gnu99 -I/usr/share/R/include -DLIBXML -I/usr/include/libxml2
-DUSE_EXTERNAL_SUBSET=1 -DROOT_HAS_DTD_NODE=1 -DDUMP_WITH_ENCODING=1
-DUSE_XML_VERSION_H=1 -DXML_ELEMENT_ETYPE=1 -DXML_ATTRIBUTE_ATYPE=1
-DNO_XML_HASH_SCANNER_RETURN=1 -DLIBXML_NAMESPACE_HAS_CONTEXT=1
-DHAVE_XML_WITH_ZLIB=1 -DHAVE_XML_HAS_FEATURE=1 -DUSE_R=1 -D_R_=1
-DHAVE_VALIDITY=1 -DXML_REF_COUNT_NODES=1  -I. -DLIBXML2=1 -fpic
-g -O2 -c DocParse.c -o DocParse.o
In file included from DocParse.c:13:
Utils.h:230:2: warning: #warning "Redefining COPY_TO_USER_STRING to
use encoding from XML parser"
gcc -std=gnu99 -I/usr/share/R/include -DLIBXML -I/usr/include/libxml2
-DUSE_EXTERNAL_SUBSET=1 -DROOT_HAS_DTD_NODE=1 -DDUMP_WITH_ENCODING=1
-DUSE_XML_VERSION_H=1 -DXML_ELEMENT_ETYPE=1 -DXML_ATTRIBUTE_ATYPE=1
-DNO_XML_HASH_SCANNER_RETURN=1 -DLIBXML_NAMESPACE_HAS_CONTEXT=1
-DHAVE_XML_WITH_ZLIB=1 -DHAVE_XML_HAS_FEATURE=1 -DUSE_R=1 -D_R_=1
-DHAVE_VALIDITY=1 -DXML_REF_COUNT_NODES=1  -I. -DLIBXML2=1 -fpic
-g -O2 -c EventParse.c -o EventParse.o
In file included from EventParse.c:15:
Utils.h:230:2: warning: #warning "Redefining COPY_TO_USER_STRING to
use encoding from XML parser"
EventParse.c: In function ‘RS_XML_textHandler’:
EventParse.c:41

Re: [R] glmnet: obtain predictions using predict and also by extracting coefficients

2012-03-21 Thread Weidong Gu
Hi Juliet,

First of all, cv.glmnet is used to estimate lambda based on
cross-validation. To get a glmnet prediction, you should use glmnet
function which uses all data in the training set. Second, you
constructed testX using a different data set (data.test.std) from one
for glmnet predict (data.test). It's not surprise the predictions are
different.

Weidong Gu



On Wed, Mar 21, 2012 at 2:35 PM, Juliet Hannah  wrote:
> All,
>
> For my understanding, I wanted to see if I can get glmnet predictions
> using both the predict function and also by multiplying coefficients
> by the variable matrix. This is not worked out. Could anyone suggest
> where I am going wrong?
> I understand that I may not have the mean/intercept correct, but the
> scaling is also off, which suggests a bigger mistake.
>
>  Thanks for your help.
>
> Juliet Hannah
>
>
> library(ElemStatLearn)
> library(glmnet)
>
> data(prostate)
>
> # training data
> data.train <- prostate[prostate$train,]
> y <- data.train$lpsa
>
> # isolate predictors
> data.train <- as.matrix(data.train[,-c(9,10)])
>
> # test data
> data.test <- prostate[!prostate$train,]
> data.test <-  as.matrix(data.test[,-c(9,10)])
>
> # scale test data  by using means and sd from training data
>
> trainMeans <- apply(data.train,2,mean)
> trainSDs <- apply(data.train,2,sd)
>
> # create standardized test data
>
> data.test.std <- sweep(data.test, 2, trainMeans)
> data.test.std <- sweep(data.test.std, 2, trainSDs, "/")
>
> # fit training model
>
> myglmnet =cv.glmnet(data.train,y)
>
> # predictions by using predict function
>
> yhat_enet <- predict(myglmnet,newx=data.test, s="lambda.min")
>
> # attempting to get predictions by using coefficients
>
> beta  <- as.vector( t(coef(myglmnet,s="lambda.min")))
>
> testX <- cbind(1,data.test.std)
>
> yhat2  <- testX %*% beta
>
> # does not match
>
> plot(yhat2,yhat_enet)
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multivariate ordinal probit regression vglm()

2012-03-21 Thread Trey Batey
Hello, all.

I'm investigating the rate at which skeletal joint surfaces pass
through a series of ordered stages (changes in morphology).  Current
statistical methods in this type of research use various logit or
probit regression techniques (e.g., proportional odds logit/probit,
forward/backward continuation ratio, or restricted/unrestricted
cumulative probit).  Data typically include the predictor (age) and
one or more response variables (the stages of skeletal morphology).
For example, the pubic symphysis and auriclar surface (two joint
surfaces of the pelvis) may be observed in three and four stages,
respectively (see sample dataframe "refdata" below).

  age  pube3   auric4
1 32  3   2
2 42  3   2
3 27  2   1
4 39  2   1
5 85  3   4

I've had some success in fitting the ordinal probit model using both
polr(method="probit") in the MASS library and vglm() in the VGAM
library, but I've hit a wall when it comes to fitting a model that
includes both response variables ("pube3" and "auric4" in the sample
dataframe above).  I've included the two univariate models below, but
I'd like to know how to model the two response variables on
age---returning the coefficients for each response AND the correlation
parameter, since the two responses are not independent.  If it would
help, please feel free to access the dataframe
(https://docs.google.com/open?id=0B5zZGW2utJN0TEctcW1oblFRcTJrNDVLOVBmRWRaQQ).
 Thanks in advance.

--Trey

***
Trey Batey---Anthropology Instructor
Mt. Hood Community College
26000 SE Stark St.
Gresham, OR 97030
trey.ba...@mhcc.edu or ekt.ba...@gmail.com


## unrestricted cumulative probit model for pubic scores
> mod.pube3
Call:
vglm(formula = refdata$pube3 ~ refdata$age, family = cumulative(link =
"probit",
parallel = FALSE, reverse = TRUE))

Coefficients:
(Intercept):1 (Intercept):2 ref$age:1 ref$age:2
  -1.65895567   -2.147559510.066882420.04055919

Degrees of Freedom: 1492 Total; 1488 Residual
Residual Deviance: 1188.909
Log-likelihood: -594.4543

## unrestricted cumulative probit model for auricular scores
> mod.auric4
Call:
vglm(formula = refdata$auric4 ~ refdata$age, family = cumulative(link
= "probit",
parallel = FALSE, reverse = TRUE))

Coefficients:
(Intercept):1 (Intercept):2 (Intercept):3 ref$age:1 ref$age:2
  -2.07719235   -2.43422370   -2.991230980.073196320.05133132
ref$age:3
   0.03797696

Degrees of Freedom: 2238 Total; 2232 Residual
Residual Deviance: 1583.47
Log-likelihood: -791.7348

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doubts about mixed effect models

2012-03-21 Thread Bert Gunter
Post this on the r-sig-mixed=models list rather than here.

However, fwiw, it is nonsense to estimate a random effect with a
sample size of 3. That's trying to estimate variance with a sample
size of 3. You can't do it with any meaningful precision. Whether or
not the effect really **is** conceptually random is beside the point.
I suggest you cross off your list of statistical advisers anyone who
says otherwise.

Entropy cannot be denied!

-- Bert

On Wed, Mar 21, 2012 at 11:01 AM, Lívia Dorneles Audino
 wrote:
> Hi everyone!
>
>
>
> I have some doubts about mixed effect models and I hope someone could help
> me. I´m trying to analyze a dataset coming from samples of dung beetles in
> the same forest fragments along 3 consecutive years (1994, 1995 and 1996)
> and 14 years after (2010). I sampled dung beetles in 18 different fragments
> with different sizes and different degrees of isolation. My aim is to
> determine whether total species richness change over time in forest
> fragments and to verify the influence of fragment size and isolation on
> species richness. However, I'm trying to find a way to consider in the
> analyses the temporal pseudo-replication in the data. So, I decided to use
> mixed effects models to analyze this data, but I still have doubts about
> how I should construct the models. When I asked for some help I received
> three different answers about how to construct the model.
>
>
> The first suggestion was to treat year as a fixed rather than a random
> effect because it won't be practical to estimate the variance of a
> random effect
> with only four levels. So, I constructed the model like cited below:
>
> m1<-lmer(riqueza~área*ano+isolamento*ano(1|fragmento),family=poisson
>
>
> The second suggestion proposed to treat year as a random effect, as cited
> bellow:
>
> m1<-lmer(riqueza~área*ano+isolamento*ano(ano|fragmento),family=poisson
>
>
> And the third suggestion also proposed to treat year as a random effect,
> but to consider it *as continuous variable rather than categorical*. In the
> models above I consider year as a categorical variable.
>
> m1<-lmer(riqueza~área*ano+isolamento*ano(ano|fragmento),family=poisson
>
>
> When I analyze my dataset using the second and the third model I always
> face with a singular convergence warning: *In mer finalize(ans): singular
> convergence (7)**.*   What is that mean? Does anyone have some idea about
> how can I solve this issue?
>
>
>
> I also need to know which one of these models is more appropriate to the
> dataset available. Does anyone have some suggestions?
>
> Thanks in advance!
>
> Lívia.
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] runif error - maximum limit?

2012-03-21 Thread Diann Prosser
Dear all,

I am receiving the error below, I think because my n is exceeding the
allowable limit for a vector.
Can anyone confirm, and help with the following questions?

The function and error:
> stPte8 <- rtrunc(rnorm, nx * ny * nsimu, linf=0, mean=as.vector(PTmn),
> sd=as.vector(PTsd))
Error in runif(n, min = pinf, max = psup) : invalid arguments

My total n is calculated by:
nx=4053, ny=4848, nsimu=1000,
n = nx*ny*nsimu = 19,648,994,000

I believe the maximum size of a vector is 2^31-1 (2,147,483,647) - clearly
smaller than my n.

I have done some initial reading about how to deal with large datasets in R,
with reference to bigmemory and ff. For anyone who has experience in this,
would you have advice regarding using these packages as opposed to creating
a work-around such as reducing the size of the matrix (the ny by nx),
batching the simulations, then piecing them back together outside of R
(ArcGIS)?

Thanks for any insights you might have.


> sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-pc-mingw32/x64 (64-bit) -- 88GB RAM

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] MASS_7.3-17  rgdal_0.7-8  raster_1.9-70sp_0.9-97   
mc2d_0.1-12  mvtnorm_0.9-9992

loaded via a namespace (and not attached):
[1] grid_2.14.2lattice_0.20-0

--
View this message in context: 
http://r.789695.n4.nabble.com/runif-error-maximum-limit-tp4493486p4493486.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Graphic legend with mathematical symbol, numeric variable and character variable

2012-03-21 Thread Peter Ehlers

Hi David,

It's best to keep the mailing list in the loop so I'm cc-ing to R-help.

I've rethought your problem and propose a different solution.

The main problem is to create a vector of expressions to be used in
the legend.
(see https://stat.ethz.ch/pipermail/r-help/2011-February/270106.html)

  tau.vals <- c(51,88,200)
  types <- c("B&K UA 1404","01dB BAP 21","RION WS3")

  leg <- vector("expression", 3)
  for(i in 1:3) leg[i] <-
  substitute(expression(
  TYPE ~ group("(", tau == tauval, ")")
   ),
 list(TYPE=types[i], tauval=tau.vals[i])
)[2]

  plot(0);legend(x="topright",legend=leg)

The idea is to create an expression vector and then
fill its components appropriately. We need the second
component of the substitute call.

Peter Ehlers

On 2012-03-21 06:49, "ECOTIÈRE David (Responsable d'activité) - CETE 
Est/LRPC de Strasbourg/6 Acoustique" wrote:

Well, I have some problems with this solution:

Here is my (real) code:

tau<-c(51,88,200)
types<-c("B&K UA 1404","01dB BAP 21","RION WS3")
types<-gsub(pattern=" ", replacement="~",x=types)
leg<-parse(text=paste(types,"(","tau==",tau,"min)", sep='~'))
plot(0)
legend(x="topright",legend=leg)


I've got an error after the 'parse' command :

Erreur dans parse(text = paste(types, "(", "tau==", tau, "min)", sep =
"~")) :
:2:3: symbole inattendu(e)
1: B&K~UA~1404~(~tau==~51~min)
2: 01dB
   ^

But no error if "01dB" is replaced by "dB01" (which is not what I want
of course). It's like if it was impossible to have a letter that follows
à number.

Any idea ?

Thank you

David

Le 21/03/2012 12:57,>  Peter Ehlers (par Internet) a écrit :

On 2012-03-20 06:09, "ECOTIÈRE David (Responsable d'activité) - CETE
Est/LRPC de Strasbourg/6 Acoustique" wrote:

Hi,

I'd like to make a legend with a mix of mathematical symbol (tau),
numeric variable and character variables.I have tried :


types<-c("Type 1","Type 2","Type 2")
tau<-c(1,3,2)
legend(x="topright",legend=paste(types,"tau=",expression(tau)))



but it doesn't work: the 'tau' symbol is not written in its 'symbol
style' but as 'tau'

Any (good) idea ?
Thank you in advance !

David


Does this do what you want:

types<- paste('Type', 1:3, sep='~')
mytau<- 4:6
leg<- parse(text=paste(types, "~~tau==", mytau, sep='~'))
plot(0)
legend('topright', legend=leg)

The '~' symbols generate spaces; use more if you want more spacing.


Peter Ehlers




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] "cannot change working directory"

2012-03-21 Thread mela
Hi, 

I need to write  program to run some results from an experiment. i have the
results saved in 2 different files, named "tuesday" and "wednesday" (both
located within the same file). when i wrote the program for "tuesday" i had
no problem running it, but when i changed the work directory to wednesday i
got the "cannot change working directory" message. I don't understand what's
different between the two. 

my first code was: 

setwd("/Users/user/Desktop/Recorded\ results/tuesday") 

and my second code is: 

setwd("/Users/user/Desktop/Recorded\ results/wednesday") 

and i copied the exact location from the terminal, so i can't have typos (i
am using a mac, if that makes any difference). 

any suggestions? i will be grateful for any insight. 

thanks 

mela 

--
View this message in context: 
http://r.789695.n4.nabble.com/cannot-change-working-directory-tp4492866p4492866.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to create a function for a self created class.

2012-03-21 Thread ankit.sethi
Hi, 
I need to create a method for a class named 'simpleOLS' which I have created
that will compute the coefficients of predictors. 

Here is my code: 

#---
 

setClass("simpleOLS", representation(dataset="matrix", beta="matrix",
x="matrix", y="matrix",   var="character")) 

computeBetas = function(dataset,str) class('simpleOLS') 
  
setMethod("computeBetas","simpleOLS",betas) 

betas = function(dataset, str) 
{ 
  x<- cbind(int=1,dataset[-which(colnames(dataset) %in% str)]) 
  y<- dataset[,length(dataset)] 
  names(x)<- NULL 
  names(y)<- NULL 
  x.numeric<- sapply(x, as.numeric) #converting into numeric type 
  y.numeric<- sapply(y, as.numeric) #converting into numeric type 
  transpose.x<- t(x.numeric) 
  X<-transpose.x%*%x.numeric 
  inv<- ginv(X, tol=sqrt(.Machine$double.eps)) 
  Y<- inv%*%transpose.x 
  beta<- Y%*%y.numeric 
  return(beta) 
} 

T2<-computeBetas(data.setA1,str) 
#
 

But when I call the function it returns me - [1] "character" which is not
the intended output. I am new to object oriented programming in R and any
help will be appreciated. I also want to know if there is any problem with
the way I have created the function. 

Thanks & Regards,

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-create-a-function-for-a-self-created-class-tp4493288p4493288.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Doubts about mixed effect models

2012-03-21 Thread Lívia Dorneles Audino
Hi everyone!



I have some doubts about mixed effect models and I hope someone could help
me. I´m trying to analyze a dataset coming from samples of dung beetles in
the same forest fragments along 3 consecutive years (1994, 1995 and 1996)
and 14 years after (2010). I sampled dung beetles in 18 different fragments
with different sizes and different degrees of isolation. My aim is to
determine whether total species richness change over time in forest
fragments and to verify the influence of fragment size and isolation on
species richness. However, I'm trying to find a way to consider in the
analyses the temporal pseudo-replication in the data. So, I decided to use
mixed effects models to analyze this data, but I still have doubts about
how I should construct the models. When I asked for some help I received
three different answers about how to construct the model.


The first suggestion was to treat year as a fixed rather than a random
effect because it won't be practical to estimate the variance of a
random effect
with only four levels. So, I constructed the model like cited below:

m1<-lmer(riqueza~área*ano+isolamento*ano(1|fragmento),family=poisson


The second suggestion proposed to treat year as a random effect, as cited
bellow:

m1<-lmer(riqueza~área*ano+isolamento*ano(ano|fragmento),family=poisson


And the third suggestion also proposed to treat year as a random effect,
but to consider it *as continuous variable rather than categorical*. In the
models above I consider year as a categorical variable.

m1<-lmer(riqueza~área*ano+isolamento*ano(ano|fragmento),family=poisson


When I analyze my dataset using the second and the third model I always
face with a singular convergence warning: *In mer finalize(ans): singular
convergence (7)**.*   What is that mean? Does anyone have some idea about
how can I solve this issue?



I also need to know which one of these models is more appropriate to the
dataset available. Does anyone have some suggestions?

Thanks in advance!

Lívia.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "cannot change working directory"

2012-03-21 Thread mela
thanks Michael and William,

I have resigned to moving the wednesday in to the tuesday file and keeping
tuesday as the work directory. this seemed to solve the problem, but i think
that also means that i do have permission to access the wednesday file. i
don't quite understand what went wrong, but it's not the first time i've
experienced problems with programs on my mac. could it be a bug or a
compatibility issue?

thanks again

mela

--
View this message in context: 
http://r.789695.n4.nabble.com/cannot-change-working-directory-tp4492812p4493039.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Assign names to the assets in portfolio frontier plot. Using frontierPlot fPortfolio.

2012-03-21 Thread PaulK
Hi,

I have troubles to assign names to the assets in the portfolio frontier plot
using frontierPlot() in fPortfolio package. Can anyone please help me? If I
use tailoredFrontierPlot in fPortfolio the assets assign names but I dont
want to plot capital market line e.t.c. Someone maybe know how to delete
stuff from tailoredFrontierPlot function?



Best Regards
Paul

--
View this message in context: 
http://r.789695.n4.nabble.com/Assign-names-to-the-assets-in-portfolio-frontier-plot-Using-frontierPlot-fPortfolio-tp4492973p4492973.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anova.lm F test confusion

2012-03-21 Thread Bert Gunter
Michelle:

You need to work with someone locally who understands basic
statistics, as you are clearly out of your depth. Posting to this list
is highly unlikely to meet your needs, nor is this an appropriate
place or means to learn statistics -- it's for help on R. (Yes, they
do overlap, but it is generally assumed that the poster knows the
statistics but needs help with R. You clearly do not fit that
description).

Cheers,
Bert

On Wed, Mar 21, 2012 at 7:11 AM, msteane  wrote:
> What if the model isn't nested, i.e. I want to test y=x+w vs y=x+z+v.   Is
> there a valid test/method to compare these (other than comparing R squared
> values)?  They are both multiple regression models.
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/anova-lm-F-test-confusion-tp4490211p4492402.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in file(file, "rt") : cannot open the connection

2012-03-21 Thread djhayes

Hi there,

I migrated from Windows7 to Ubuntu (11.10) and am trying to get my R 
back online.

I've switched from Tinn-R to RKWard which seems to work well.

My problem at the moment is that I'm no longer able to read in the data 
file I was working on before.


>   popafr <- read.table(file="~/Documents/My PhD/Modelling/SAS/SAS 
datasets/SAS export sets/Final reference curve sets/SAS_for_R AFRO 
11jan11.txt",header=T, sep="\t")

Error in file(file, "rt") : cannot open the connection
Calls: read.table -> file
In addition: Warning message:
In file(file, "rt") :
  cannot open file '/home/me/Documents/My PhD/Modelling/SAS/SAS 
datasets/SAS export sets/Final reference curve sets/SAS_for_R AFRO 
11jan11.txt': No such file or directory

>

I get the same results with:
>popafr <- read.table(file.choose())
>popafr <- read.table(file="SAS_for_R AFRO 11jan11.txt",header=T, sep="\t")

The file is physically present in both the getwd() and the directory 
listed above even with the .choose option I receive the same error.



Does this have to do with file permissions? I've tried running sudo su 
and then starting R in a terminal with the same error.


any help is much appreciated.

Hopefully yours,
Daniel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glmnet: obtain predictions using predict and also by extracting coefficients

2012-03-21 Thread Juliet Hannah
Oops. Coefficients are returned on the scale of the original data.

testX <- cbind(1,data.test)
yhat2  <- testX %*% beta

# works

plot(yhat2,yhat_enet)


On Wed, Mar 21, 2012 at 2:35 PM, Juliet Hannah  wrote:
> All,
>
> For my understanding, I wanted to see if I can get glmnet predictions
> using both the predict function and also by multiplying coefficients
> by the variable matrix. This is not worked out. Could anyone suggest
> where I am going wrong?
> I understand that I may not have the mean/intercept correct, but the
> scaling is also off, which suggests a bigger mistake.
>
>  Thanks for your help.
>
> Juliet Hannah
>
>
> library(ElemStatLearn)
> library(glmnet)
>
> data(prostate)
>
> # training data
> data.train <- prostate[prostate$train,]
> y <- data.train$lpsa
>
> # isolate predictors
> data.train <- as.matrix(data.train[,-c(9,10)])
>
> # test data
> data.test <- prostate[!prostate$train,]
> data.test <-  as.matrix(data.test[,-c(9,10)])
>
> # scale test data  by using means and sd from training data
>
> trainMeans <- apply(data.train,2,mean)
> trainSDs <- apply(data.train,2,sd)
>
> # create standardized test data
>
> data.test.std <- sweep(data.test, 2, trainMeans)
> data.test.std <- sweep(data.test.std, 2, trainSDs, "/")
>
> # fit training model
>
> myglmnet =cv.glmnet(data.train,y)
>
> # predictions by using predict function
>
> yhat_enet <- predict(myglmnet,newx=data.test, s="lambda.min")
>
> # attempting to get predictions by using coefficients
>
> beta  <- as.vector( t(coef(myglmnet,s="lambda.min")))
>
> testX <- cbind(1,data.test.std)
>
> yhat2  <- testX %*% beta
>
> # does not match
>
> plot(yhat2,yhat_enet)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automaticall adjust axis scales

2012-03-21 Thread Alaios
Would it be possible to change the axis at the end?
My data sets seem to be quite large so I was thinking for the plot and the 
consequent lines to keep always 
the current minimum and maximum like
 
plot(x)
minimum=min(x)
lines(x1)
minimum=c(minimum,x1)
lines(x2)minimum=c(minimum,x2)
then if there is a good way for altering axis lines I could put the limits I 
want to.
 
Regards
Alex
 


 From: Jim Lemon 
To: 

Sent: Tuesday, March 20, 2012 10:07 AM
Subject: Re: [R] Automaticall adjust axis scales
  
Alaios wrote:
> Dear all,
>
> I have made a function that given a number of list elements plot them to the 
> same window.
>
> The first element is plotted by using plot and all the rest are plotted under 
> the
>
> same window by using lines.
>
> I have below a small and simple reproducible example.
>
>
> x1<-c(1:10)
> plot(x1)
>
> x2<-c(11:20)
> lines(x2)
>
> x3<-c(31:40)
> lines(x3)
>
>
>
>
> as you might notice
> the two consecutive lines fail to be plotted as the axis were formed by the 
> first plot.
> Would it be possible after the last lines to change the axis to the minimum 
> and the maximum of all data sets to be visible?
>
> Any idea how I can do that?
>
>
Hi Alaois,
Try this:

ylim=range(c(x1,x2,x3))
plot(x1,ylim=ylim,type="l")
lines(x2)
lines(x3)

Jim
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmnet: obtain predictions using predict and also by extracting coefficients

2012-03-21 Thread Juliet Hannah
All,

For my understanding, I wanted to see if I can get glmnet predictions
using both the predict function and also by multiplying coefficients
by the variable matrix. This is not worked out. Could anyone suggest
where I am going wrong?
I understand that I may not have the mean/intercept correct, but the
scaling is also off, which suggests a bigger mistake.

 Thanks for your help.

Juliet Hannah


library(ElemStatLearn)
library(glmnet)

data(prostate)

# training data
data.train <- prostate[prostate$train,]
y <- data.train$lpsa

# isolate predictors
data.train <- as.matrix(data.train[,-c(9,10)])

# test data
data.test <- prostate[!prostate$train,]
data.test <-  as.matrix(data.test[,-c(9,10)])

# scale test data  by using means and sd from training data

trainMeans <- apply(data.train,2,mean)
trainSDs <- apply(data.train,2,sd)

# create standardized test data

data.test.std <- sweep(data.test, 2, trainMeans)
data.test.std <- sweep(data.test.std, 2, trainSDs, "/")

# fit training model

myglmnet =cv.glmnet(data.train,y)

# predictions by using predict function

yhat_enet <- predict(myglmnet,newx=data.test, s="lambda.min")

# attempting to get predictions by using coefficients

beta  <- as.vector( t(coef(myglmnet,s="lambda.min")))

testX <- cbind(1,data.test.std)

yhat2  <- testX %*% beta

# does not match

plot(yhat2,yhat_enet)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Type II and III sum of squares (R and SPSS)

2012-03-21 Thread peter dalgaard

On Mar 21, 2012, at 11:27 , Marco Tommasi wrote:

> To whom it may concern
> 
> I made some analysis with R using the command Anova. However, I found 
> some problmes with the output obtained by selecting type II o type III 
> sum of squares.

Well, it would primarily concern the maintainer of the "car" package, which is 
the one containing the (capital-A) Anova() function. The type III SS don't look 
right to me either. With aov() we get

> M3l <- reshape(M3, direction="long", varying=c("b1","b2","b3"),sep="")
> summary(aov(b~fattA*factor(time)+ Error(factor(id)), M3l))

Error: factor(id)
  Df Sum Sq Mean Sq F value  Pr(>F)   
fattA  1 26.042  26.04216.3 0.00682 **
Residuals  6  9.583   1.597   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Error: Within
   Df Sum Sq Mean Sq F value   Pr(>F)
factor(time)2  42.33  21.167   23.81 6.65e-05 ***
fattA:factor(time)  2  20.33  10.167   11.44  0.00166 ** 
Residuals  12  10.67   0.889 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 



> 
> Briefly, I have to do a 2x3 mixed model anova, wherein the first factor 
> is a between factor and the second factor is a within factor. I use the 
> command Anova in the list below, because I want to obtain also the sum 
> of squares of the linear and quadratic contrast between the levels of 
> the within factor.
> 
> 
> 
> 
> Below I report the list of commands used in R ("fattA" is the beteween 
> factor and "fB" is the within factor):
> 
>> a1b1<-c(10,9,8,7)
>> a1b2<-c(7,6,4,5)
>> a1b3<-c(3,2,3,4)
>> a2b1<-c(9,9,8,7)
>> a2b2<-c(8,7,9,7)
>> a2b3<-c(7,8,8,6)
>> 
>> M3<-matrix(0,8,4)
>> M3[,1]<-cbind(a1b1,a2b1)
>> M3[,2]<-cbind(a1b2,a2b2)
>> M3[,3]<-cbind(a1b3,a2b3)
>> M3[,4]<-rep(c(1,2),each=4)
>> 
>> colnames(M3)<-c("b1","b2","b3","fattA")
>> 
>> M3<-as.data.frame(M3)
>> 
>> require(car)
> Loading required package: car
> Loading required package: MASS
> Loading required package: nnet
>> f5<-lm(cbind(b1,b2,b3)~fattA,data=M3)
>> a5<-Anova(f5)
> 
>> f6<-lm(c(b1,b2,b3)~rep(fattA,3),data=M3)
>> 
>> 
>> fB<-factor(c(1:3))
>> d2<-data.frame(fB)
>> a6<-Anova(f5,idata=d2,idesign=~fB,type=2)
> 
>> summary(a6,multivariate=F)
> 
> Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
> 
>  SS num Df Error SS den Df   FPr(>F)
> (Intercept) 1080.04  1   9.5833  6 676.200 2.133e-07 ***
> fattA 26.04  1   9.5833  6  16.304  0.006819 **
> fB42.33  2  10.6667 12  23.812 6.645e-05 ***
> fattA:fB  20.33  2  10.6667 12  11.438  0.001660 **
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> 
> Mauchly Tests for Sphericity
> 
>  Test statistic p-value
> fB  0.87891  0.7242
> fattA:fB0.87891  0.7242
> 
> 
> Greenhouse-Geisser and Huynh-Feldt Corrections
>  for Departure from Sphericity
> 
>   GG eps Pr(>F[GG])
> fB   0.89199  0.0001474 ***
> fattA:fB 0.89199  0.0026452 **
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
>  HF eps Pr(>F[HF])
> fB   1.2438  6.645e-05 ***
> fattA:fB 1.24380.00166 **
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> Warning message:
> In summary.Anova.mlm(a6, multivariate = F) : HF eps > 1 treated as 1
> 
> 
> I repated the anlysis by setting type III sum of squares and I obtained:
> 
>> a6<-Anova(f5,idata=d2,idesign=~fB,type=3)
>> summary(a6,multivariate=F)
> 
> Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
> 
> SS num Df Error SS den Df  F   Pr(>F)
> (Intercept) 30.817  1   9.5833  6 19.294 0.004606 **
> fattA   26.042  1   9.5833  6 16.304 0.006819 **
> fB  40.133  2  10.6667 12 22.575 8.57e-05 ***
> fattA:fB20.333  2  10.6667 12 11.438 0.001660 **
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> 
> Mauchly Tests for Sphericity
> 
>  Test statistic p-value
> fB  0.87891  0.7242
> fattA:fB0.87891  0.7242
> 
> 
> Greenhouse-Geisser and Huynh-Feldt Corrections
>  for Departure from Sphericity
> 
>   GG eps Pr(>F[GG])
> fB   0.89199  0.0001851 ***
> fattA:fB 0.89199  0.0026452 **
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
>  HF eps Pr(>F[HF])
> fB   1.2438   8.57e-05 ***
> fattA:fB 1.24380.00166 **
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> Warning message:
> In summary.Anova.mlm(a6, multivariate = F) : HF eps > 1 treated as 1
> 
> 
> As you can see, the sum of squares of the within factor "fB" and of the 
> intercept obtained by setting type II sum of squares are dofferent form 
> those obtained by setting type III sum of squares. I repeated the 
> analysis by using SPPS (type II and III) and i obtained the same sum of 
> squares for both types., whi

Re: [R] Unable to specify order of a factor

2012-03-21 Thread David Winsemius


On Mar 21, 2012, at 1:07 PM, Justin Montemarano wrote:


Hi all:

I've got it... it appears that total.density was also defined in two
separate data frames (se.predict.data and dc.predict.data) with levels
order 16, 32, 8. Using relevel(), I moved 8 to the first position  
and it's

solved the plotting problem.

Ista's 'minimal' reproducible code request prompted me to discover my
error; thanks all.


I've had the experience in the last few years that almost all of my  
questions to Rhelp have needed to be peacefully euthanized after being  
subjected to the rack of hammering into a "reproducible" condition.


--
David.



-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com


On Wed, Mar 21, 2012 at 12:42 PM, R. Michael Weylandt <
michael.weyla...@gmail.com> wrote:


You'll also want to use dput() to send us an exact encoding of your
data when making that reproducible example: there might be something
subtle at play here that print methods won't show.

Michael

On Wed, Mar 21, 2012 at 12:28 PM, Ista Zahn   
wrote:
On Wed, Mar 21, 2012 at 12:00 PM, Justin Montemarano >

wrote:

Ista:

Your attached code did work for me; moreover, the facets were  
presented

in
the desired order with facet_wrap() and facet_grid(), which is  
what I'm

using because I have a second factor used in facet_grid().

Still, my plots with total.density as a facet are coming out in  
16, 32,

8,

and I'm not seeing why.  Below is my plot code -


ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y =
per.remain)) + facet_grid(total.density ~ prop.ec) +
   #add point and error bar data
   theme_set(theme_bw()) +
   geom_point() + geom_errorbar(aes(ymin = per.remain - se, ymax =
per.remain + se), width = 3) +
   #add predicted model data
   geom_line(data = se.predict.data[se.predict.data$plant.sp ==

'EC',],

aes(x = x.values, y = predicted.values), colour = c('red')) +
   geom_line(data = dc.predict.data[dc.predict.data$plant.sp ==

'EC',],
aes(x = x.values, y = predicted.values), colour = c('blue'),  
linetype =

c('dashed')) +

   xlab('Day') + ylab('Percent Mass Remaining') +

opts(panel.grid.major =

theme_blank(), panel.grid.minor = theme_blank())


Is there anything odd about it that might be producing the odd  
ordering

problem?  FYI, avoiding subsetting ag.tab doesn't do the trick.


I don't know. Please create a minimal example that isolates the
problem. You can start with

levels(ag.tab$total.density)

ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y =

per.remain)) +

  facet_grid(total.density ~ prop.ec) +
  geom_point()

Best,
Ista


-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com


On Wed, Mar 21, 2012 at 11:42 AM, Ista Zahn   
wrote:


Hi Justin,

this gives the correct order (8, 16, 32) on my machine:

total.density <-


c 
(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32 
)

total.density <- factor(total.density, levels=c(8, 16, 32),

ordered=TRUE)

str(total.density)

order(levels(total.density))

dat <- data.frame(td = total.density, v1 =

rnorm(1:length(total.density)))


ggplot(dat, aes(x = v1)) +
geom_density() +
facet_wrap(~td)

Does it work for you? If yes, then you need to tell us what you're
doing that is different from this example. If no, please give  
use the

output of sessionInfo().

best,
Ista

On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano <

jmont...@kent.edu>

wrote:
I think I understand, but I believe my original interest is in  
the

order

of
levels(total.density), since ggplot appears to be using that to  
order

the
facets.  Thus, I'm still getting three graphs, ordered (and  
displayed

as)
16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32.   
I'm

sorry

if
I wasn't clear and/or I've missed your message.
-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com

  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible  
code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, 

Re: [R] Unable to specify order of a factor

2012-03-21 Thread Justin Montemarano
Hi all:

I've got it... it appears that total.density was also defined in two
separate data frames (se.predict.data and dc.predict.data) with levels
order 16, 32, 8. Using relevel(), I moved 8 to the first position and it's
solved the plotting problem.

Ista's 'minimal' reproducible code request prompted me to discover my
error; thanks all.

-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com


On Wed, Mar 21, 2012 at 12:42 PM, R. Michael Weylandt <
michael.weyla...@gmail.com> wrote:

> You'll also want to use dput() to send us an exact encoding of your
> data when making that reproducible example: there might be something
> subtle at play here that print methods won't show.
>
> Michael
>
> On Wed, Mar 21, 2012 at 12:28 PM, Ista Zahn  wrote:
> > On Wed, Mar 21, 2012 at 12:00 PM, Justin Montemarano 
> wrote:
> >> Ista:
> >>
> >> Your attached code did work for me; moreover, the facets were presented
> in
> >> the desired order with facet_wrap() and facet_grid(), which is what I'm
> >> using because I have a second factor used in facet_grid().
> >>
> >> Still, my plots with total.density as a facet are coming out in 16, 32,
> 8,
> >> and I'm not seeing why.  Below is my plot code -
> >>
> >>> ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y =
> >>> per.remain)) + facet_grid(total.density ~ prop.ec) +
> >>> #add point and error bar data
> >>> theme_set(theme_bw()) +
> >>> geom_point() + geom_errorbar(aes(ymin = per.remain - se, ymax =
> >>> per.remain + se), width = 3) +
> >>> #add predicted model data
> >>> geom_line(data = se.predict.data[se.predict.data$plant.sp ==
> 'EC',],
> >>> aes(x = x.values, y = predicted.values), colour = c('red')) +
> >>> geom_line(data = dc.predict.data[dc.predict.data$plant.sp ==
> 'EC',],
> >>> aes(x = x.values, y = predicted.values), colour = c('blue'), linetype =
> >>> c('dashed')) +
> >>>
> >>> xlab('Day') + ylab('Percent Mass Remaining') +
> opts(panel.grid.major =
> >>> theme_blank(), panel.grid.minor = theme_blank())
> >>
> >> Is there anything odd about it that might be producing the odd ordering
> >> problem?  FYI, avoiding subsetting ag.tab doesn't do the trick.
> >
> > I don't know. Please create a minimal example that isolates the
> > problem. You can start with
> >
> > levels(ag.tab$total.density)
> >
> > ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y =
> per.remain)) +
> >facet_grid(total.density ~ prop.ec) +
> >geom_point()
> >
> > Best,
> > Ista
> >
> >> -
> >> Justin Montemarano
> >> Graduate Student
> >> Kent State University - Biological Sciences
> >>
> >> http://www.montegraphia.com
> >>
> >>
> >> On Wed, Mar 21, 2012 at 11:42 AM, Ista Zahn  wrote:
> >>>
> >>> Hi Justin,
> >>>
> >>> this gives the correct order (8, 16, 32) on my machine:
> >>>
> >>> total.density <-
> >>>
> >>>
> c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32)
> >>> total.density <- factor(total.density, levels=c(8, 16, 32),
> ordered=TRUE)
> >>> str(total.density)
> >>>
> >>> order(levels(total.density))
> >>>
> >>> dat <- data.frame(td = total.density, v1 =
> rnorm(1:length(total.density)))
> >>>
> >>> ggplot(dat, aes(x = v1)) +
> >>>  geom_density() +
> >>>  facet_wrap(~td)
> >>>
> >>> Does it work for you? If yes, then you need to tell us what you're
> >>> doing that is different from this example. If no, please give use the
> >>> output of sessionInfo().
> >>>
> >>> best,
> >>> Ista
> >>>
> >>> On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano <
> jmont...@kent.edu>
> >>> wrote:
> >>> > I think I understand, but I believe my original interest is in the
> order
> >>> > of
> >>> > levels(total.density), since ggplot appears to be using that to order
> >>> > the
> >>> > facets.  Thus, I'm still getting three graphs, ordered (and displayed
> >>> > as)
> >>> > 16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32.  I'm
> sorry
> >>> > if
> >>> > I wasn't clear and/or I've missed your message.
> >>> > -
> >>> > Justin Montemarano
> >>> > Graduate Student
> >>> > Kent State University - Biological Sciences
> >>> >
> >>> > http://www.montegraphia.com
> >>> >
> >>> >[[alternative HTML version deleted]]
> >>> >
> >>> > __
> >>> > R-help@r-project.org mailing list
> >>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> > PLEASE do read the posting guide
> >>> > http://www.R-project.org/posting-guide.html
> >>> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible

Re: [R] "cannot change working directory"

2012-03-21 Thread R. Michael Weylandt
list.dirs() will list the directories in that file exactly as R sees
them. If you are getting can't change working directory messages I can
only think of two reasons:

i) File permissions (though that's unlikely to get messed up unless
there's someone around who knows how to do that in the first place)
ii) You don't actually have the right wd name.

So like I said, please give the output of

list.dirs("/Users/user/Desktop/Recorded results/", recursive = FALSE)

and then try something like

setwd(list.dirs("/Users/user/Desktop/Recorded results/", recursive =
FALSE)[XXX])

where XXX is the number of the path that gives the Wednesday files.
normalizePath() might also help.

If you really want to check the file permissions route, this will be a start

system("ls -hFGlxo /Users/user/Desktop/Recorded results")

but you'll probably need some help to interpret the output.

Michael

On Wed, Mar 21, 2012 at 12:30 PM, Carmela Friedman
 wrote:
> could you explain what that is please? i am new to R
> On 21 Mar 2012, at 16:27, R. Michael Weylandt wrote:
>
>> Scratch that -- while true, it doesn't seem to cause that error message.
>>
>> Can you do this?
>>
>> list.dirs("/Users/user/Desktop/Recorded results/")
>>
>> Also, are you sure you have various access permissions to the Wednesday 
>> folder?
>>
>> Michael
>>
>> On Wed, Mar 21, 2012 at 12:25 PM, R. Michael Weylandt
>>  wrote:
>>> If you are on a mac, you don't need to escape spaces within setwd():
>>> e.g., on my machine,  setwd("~/Desktop/Current Semester") works just
>>> fine.
>>>
>>> Michael
>>>
>>> On Wed, Mar 21, 2012 at 12:08 PM, mela  wrote:
 Hi,

 I need to write  program to run some results from an experiment. i have the
 results saved in 2 different files, named "tuesday" and "wednesday" (both
 located within the same file). when i wrote the program for "tuesday" i had
 no problem running it, but when i changed the work directory to wednesday i
 got the "cannot change working directory" message. I don't understand 
 what's
 different between the two.

 my first code was:

 setwd("/Users/user/Desktop/Recorded\ results/tuesday")

 and my second code is:

 setwd("/Users/user/Desktop/Recorded\ results/wednesday")

 and i copied the exact location from the terminal, so i can't have typos (i
 am using a mac, if that makes any difference).

 any suggestions? i will be grateful for any insight.

 thanks

 mela


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/cannot-change-working-directory-tp4492812p4492812.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Check results between two data.frame

2012-03-21 Thread Sarah Goslee
On Wed, Mar 21, 2012 at 12:48 PM, HJ YAN  wrote:
> Thanks a lot Sarah, for your nice example and code.
>
> I know '==' can do the work.  Just as a R beginer, sometimes really want to
> be more like a 'real programmer', if you know what I mean...

Being a "real programmer" means using one line of code instead of a
dozen, when that one line is more elegant and understandable.

Being a real programmer also means learning how to debug your code: a
good first step is to set x and y to actual matrices and try each step
in the console. (And don't give them default values that are strings
if your function expects them to be matrices.)

If you'd actually tried to run each line, you would quickly discover
that as.matrix() and matrix() do not do the same thing, and you need
the latter.

> NROW
[1] 5
> NCOL
[1] 4
> CHECK_XY <- as.matrix(NA, NROW, NCOL)
> CHECK_XY
 [,1]
[1,]   NA
>

> CHECK_XY <- matrix(NA, NROW, NCOL)
> CHECK_XY
 [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA   NA
[2,]   NA   NA   NA   NA
[3,]   NA   NA   NA   NA
[4,]   NA   NA   NA   NA
[5,]   NA   NA   NA   NA
>


> If any R expert could help to check where my code went wrong here that would
> be very grateful!

Getting help from "any R expert" is far more likely if you copy your
messages to the R-help list as well as just to me. I've taken the
liberty of doing so with my reply.

Sarah


> Many thanks!
> HJ
> On Wed, Mar 21, 2012 at 4:13 PM, Sarah Goslee 
> wrote:
>>
>> As long as == is an appropriate test for your data, why not just use
>> R's innate ability to handle matrices/data frames?
>>
>> > x1 <- matrix(1:20, ncol=4)
>> > x2 <- ifelse(x1 > 18, 22, x1)
>> > x1
>>     [,1] [,2] [,3] [,4]
>> [1,]    1    6   11   16
>> [2,]    2    7   12   17
>> [3,]    3    8   13   18
>> [4,]    4    9   14   19
>> [5,]    5   10   15   20
>> > x2
>>     [,1] [,2] [,3] [,4]
>> [1,]    1    6   11   16
>> [2,]    2    7   12   17
>> [3,]    3    8   13   18
>> [4,]    4    9   14   22
>> [5,]    5   10   15   22
>> > x1 == x2
>>     [,1] [,2] [,3]  [,4]
>> [1,] TRUE TRUE TRUE  TRUE
>> [2,] TRUE TRUE TRUE  TRUE
>> [3,] TRUE TRUE TRUE  TRUE
>> [4,] TRUE TRUE TRUE FALSE
>> [5,] TRUE TRUE TRUE FALSE
>>
>>
>> On Wed, Mar 21, 2012 at 8:48 AM, HJ YAN  wrote:
>> > Dear R-user,
>> >
>> > I'm trying to compare two sets of results and wanted to find out which
>> > element in the two data frame/matrix are different.
>> >
>> > I wrote the following function and it works ok, and gives me a long list
>> > of
>> > "good" as outcomes.
>> >
>> >
>> > CHECK<-
>> > function (x = "file1", y = "file2")
>> > {
>> >    for (i in 1:nrow(x)) {
>> >        for (j in 1:ncol(x)) {
>> >            if (x[i, j] == y[i, j]) {
>> >                print("good")
>> >            }
>> >            else {
>> >                print("check")
>> >            }
>> >        }
>> >    }
>> > }
>> >
>> >
>> > However, as the two datasets I was comparing are large (400*100
>> > roughly),
>> > so I would like to create a matrix to identify which ones are not same
>> > in
>> > the two dataframes.
>> >
>> > So I added 'CHECK_XY' in my code but  when I run it, I got 'Error in
>> > CHECK_XY[i, j] = c("good") : subscript out of bounds'.
>> >
>> > Could anyone help please??
>> >
>> > CHECK_1<-
>> > function (x = "file1", y = "file2")
>> > {
>> >    NROW <- nrow(x)
>> >    NCOL <- ncol(x)
>> >    CHECK_XY <- as.matrix(NA, NROW, NCOL)
>> >    for (i in 1:nrow(x)) {
>> >        for (j in 1:ncol(x)) {
>> >            if (x[i, j] == y[i, j]) {
>> >                CHECK_XY[i, j] = c("good")
>> >            }
>> >            else {
>> >                CHECK_XY[i, j] = c("check")
>> >            }
>> >        }
>> >    }
>> >    print(CHECK_XY)
>> > }
>> >
>> > Thanks!
>> > HJ
>>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to specify order of a factor

2012-03-21 Thread R. Michael Weylandt
You'll also want to use dput() to send us an exact encoding of your
data when making that reproducible example: there might be something
subtle at play here that print methods won't show.

Michael

On Wed, Mar 21, 2012 at 12:28 PM, Ista Zahn  wrote:
> On Wed, Mar 21, 2012 at 12:00 PM, Justin Montemarano  
> wrote:
>> Ista:
>>
>> Your attached code did work for me; moreover, the facets were presented in
>> the desired order with facet_wrap() and facet_grid(), which is what I'm
>> using because I have a second factor used in facet_grid().
>>
>> Still, my plots with total.density as a facet are coming out in 16, 32, 8,
>> and I'm not seeing why.  Below is my plot code -
>>
>>> ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y =
>>> per.remain)) + facet_grid(total.density ~ prop.ec) +
>>>     #add point and error bar data
>>>     theme_set(theme_bw()) +
>>>     geom_point() + geom_errorbar(aes(ymin = per.remain - se, ymax =
>>> per.remain + se), width = 3) +
>>>     #add predicted model data
>>>     geom_line(data = se.predict.data[se.predict.data$plant.sp == 'EC',],
>>> aes(x = x.values, y = predicted.values), colour = c('red')) +
>>>     geom_line(data = dc.predict.data[dc.predict.data$plant.sp == 'EC',],
>>> aes(x = x.values, y = predicted.values), colour = c('blue'), linetype =
>>> c('dashed')) +
>>>
>>>     xlab('Day') + ylab('Percent Mass Remaining') + opts(panel.grid.major =
>>> theme_blank(), panel.grid.minor = theme_blank())
>>
>> Is there anything odd about it that might be producing the odd ordering
>> problem?  FYI, avoiding subsetting ag.tab doesn't do the trick.
>
> I don't know. Please create a minimal example that isolates the
> problem. You can start with
>
> levels(ag.tab$total.density)
>
> ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = per.remain)) +
>    facet_grid(total.density ~ prop.ec) +
>    geom_point()
>
> Best,
> Ista
>
>> -
>> Justin Montemarano
>> Graduate Student
>> Kent State University - Biological Sciences
>>
>> http://www.montegraphia.com
>>
>>
>> On Wed, Mar 21, 2012 at 11:42 AM, Ista Zahn  wrote:
>>>
>>> Hi Justin,
>>>
>>> this gives the correct order (8, 16, 32) on my machine:
>>>
>>> total.density <-
>>>
>>> c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32)
>>> total.density <- factor(total.density, levels=c(8, 16, 32), ordered=TRUE)
>>> str(total.density)
>>>
>>> order(levels(total.density))
>>>
>>> dat <- data.frame(td = total.density, v1 = rnorm(1:length(total.density)))
>>>
>>> ggplot(dat, aes(x = v1)) +
>>>  geom_density() +
>>>  facet_wrap(~td)
>>>
>>> Does it work for you? If yes, then you need to tell us what you're
>>> doing that is different from this example. If no, please give use the
>>> output of sessionInfo().
>>>
>>> best,
>>> Ista
>>>
>>> On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano 
>>> wrote:
>>> > I think I understand, but I believe my original interest is in the order
>>> > of
>>> > levels(total.density), since ggplot appears to be using that to order
>>> > the
>>> > facets.  Thus, I'm still getting three graphs, ordered (and displayed
>>> > as)
>>> > 16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32.  I'm sorry
>>> > if
>>> > I wasn't clear and/or I've missed your message.
>>> > -
>>> > Justin Montemarano
>>> > Graduate Student
>>> > Kent State University - Biological Sciences
>>> >
>>> > http://www.montegraphia.com
>>> >
>>> >        [[alternative HTML version deleted]]
>>> >
>>> > __
>>> > R-help@r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> > http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] passing xlim to coord_map in ggplot2

2012-03-21 Thread Ista Zahn
Hi Zack,

This works as expected on my machine:


tank_trunc <- 
read.csv("http://r.789695.n4.nabble.com/file/n4492267/upton_tank_trunc_nabble.csv";)

michigan_map.df <- map_data('county', 'michigan')
ggplot() + geom_point(aes(lon, lat), data = tank_trunc, na.rm = T) +
geom_path(aes(long, lat, group = group), data = michigan_map.df) +
coord_map('gilbert', xlim = c(-88, -82))

> sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] mapproj_1.1-8.3 maps_2.2-5  ggplot2_0.9.0

loaded via a namespace (and not attached):
 [1] MASS_7.3-17RColorBrewer_1.0-5 colorspace_1.1-1   dichromat_1.2-4
 [5] digest_0.5.2   grid_2.14.2memoise_0.1munsell_0.3
 [9] plyr_1.7.1 proto_0.3-9.2  reshape2_1.2.1 scales_0.2.0
[13] stringr_0.6tools_2.14.2


Maybe you have some out-of-date packages?

Best,
Ista

On Wed, Mar 21, 2012 at 9:28 AM, z2.0  wrote:
> The first block of code should be reproducible.
>
> For the second block, you need only a data.frame. I've included a few rows
> from the one I'm working with.
>
> Two required libraries: maps, ggplot2.
> http://r.789695.n4.nabble.com/file/n4492267/upton_tank_trunc_nabble.csv
> upton_tank_trunc_nabble.csv
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/passing-xlim-to-coord-map-in-ggplot2-tp4490005p4492267.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Forloop/ifelse program problem and list of dataframes

2012-03-21 Thread Ivan Calandra

Hi Ian,

I haven't read in details because you don't provide a reproducible 
example (see ?dput) but you might want to take a look at the aggregate() 
and doBy::summaryBy() functions.


HTH,
Ivan

--
Ivan CALANDRA
Université de Bourgogne
UMR CNRS/uB 6282 Biogéosciences
6 Boulevard Gabriel
21000 Dijon, FRANCE
+33(0)3.80.39.63.06
ivan.calan...@u-bourgogne.fr
http://biogeosciences.u-bourgogne.fr/calandra


Le 21/03/12 16:44, Ian Craig a écrit :

Hello R Community,

I don't post to these things often so excuse me if I stumble on my forum
etiquette.  This is a complex problem for me, which may require two forum
entries, but I will try my best to be concise.   Also, I am a self taught
coder, so if my code is not to convention, your constructive criticism is
always welcome.

I need to split up a data frame by participant (gpsarc - factor), status
(testpo - factor), and by date (LocalDate), then sum the distance
(Dist_Bef_m - numeric) records for each date and average them across each
status state for each participant.  Each participant has several records
for each date and for at least 1 of 3 different possible status types (max
3).   In the end, I want a table with participant number and the status
state as my column headings with means for each status state under their
appropriate heading (see example below).  I am confident I made this way
more complicated than it needs to be.  I really appreciate any help you can
offer.

Here is my relevant coding so far:

s1<- split(data[,c(4,10,20,42)], data$gpsarc)
   for(i in 1:length(s1))
s1[[i]]<- split(s1[[i]],s1[[i]]$testpo)
   s2<- vector("list", length(s1))
   for(i in 1:length(s2))
s2[[i]]<- ldply(s1[[i]],
function(x)
  {
   if(nrow(x) == 0)    if one status state does not exist, but still
accounted in the split sublist because its a factor, I would get an error,
so I added this If/Else portion to remove those entries with no records
{
 remove(x)
}
   else
{
 by(x, x[["LocalDate"]],
  function(x1)
{
  sum(x1[["Dist_Bef_m"]])
  })
  }
   })

s3<- vector("list", length(s2))
   for(i in 1:length(s3))
   s3[[i]]<- data.frame(mean = apply(s2[[i]][,-1],1,mean,na.rm=TRUE),
row.names = as.character(s2[[i]][,1]))

here is a sample of the s3 result:

[[1]]
  mean
2 12533.2

[[2]]
   mean
2 26300.96
3 25313.93

[[3]]
   mean
1 48489.15
3 27398.23

[[4]]
   mean
1 34783.97

[[5]]
   mean
1 21293.19
2 21962.41
3 18272.67

# I really want it to look like this:

  ppt 1   2 3
1  NA   12533.2 NA
2  NA   26300.96   25313.93
3  48489.15NA  27398.23
4  34783.97NA  NA
5  21293.1921962.41   18272.67

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "cannot change working directory"

2012-03-21 Thread R. Michael Weylandt
If you are on a mac, you don't need to escape spaces within setwd():
e.g., on my machine,  setwd("~/Desktop/Current Semester") works just
fine.

Michael

On Wed, Mar 21, 2012 at 12:08 PM, mela  wrote:
> Hi,
>
> I need to write  program to run some results from an experiment. i have the
> results saved in 2 different files, named "tuesday" and "wednesday" (both
> located within the same file). when i wrote the program for "tuesday" i had
> no problem running it, but when i changed the work directory to wednesday i
> got the "cannot change working directory" message. I don't understand what's
> different between the two.
>
> my first code was:
>
> setwd("/Users/user/Desktop/Recorded\ results/tuesday")
>
> and my second code is:
>
> setwd("/Users/user/Desktop/Recorded\ results/wednesday")
>
> and i copied the exact location from the terminal, so i can't have typos (i
> am using a mac, if that makes any difference).
>
> any suggestions? i will be grateful for any insight.
>
> thanks
>
> mela
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/cannot-change-working-directory-tp4492812p4492812.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to specify order of a factor

2012-03-21 Thread Ista Zahn
On Wed, Mar 21, 2012 at 12:00 PM, Justin Montemarano  wrote:
> Ista:
>
> Your attached code did work for me; moreover, the facets were presented in
> the desired order with facet_wrap() and facet_grid(), which is what I'm
> using because I have a second factor used in facet_grid().
>
> Still, my plots with total.density as a facet are coming out in 16, 32, 8,
> and I'm not seeing why.  Below is my plot code -
>
>> ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y =
>> per.remain)) + facet_grid(total.density ~ prop.ec) +
>>     #add point and error bar data
>>     theme_set(theme_bw()) +
>>     geom_point() + geom_errorbar(aes(ymin = per.remain - se, ymax =
>> per.remain + se), width = 3) +
>>     #add predicted model data
>>     geom_line(data = se.predict.data[se.predict.data$plant.sp == 'EC',],
>> aes(x = x.values, y = predicted.values), colour = c('red')) +
>>     geom_line(data = dc.predict.data[dc.predict.data$plant.sp == 'EC',],
>> aes(x = x.values, y = predicted.values), colour = c('blue'), linetype =
>> c('dashed')) +
>>
>>     xlab('Day') + ylab('Percent Mass Remaining') + opts(panel.grid.major =
>> theme_blank(), panel.grid.minor = theme_blank())
>
> Is there anything odd about it that might be producing the odd ordering
> problem?  FYI, avoiding subsetting ag.tab doesn't do the trick.

I don't know. Please create a minimal example that isolates the
problem. You can start with

levels(ag.tab$total.density)

ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = per.remain)) +
facet_grid(total.density ~ prop.ec) +
geom_point()

Best,
Ista

> -
> Justin Montemarano
> Graduate Student
> Kent State University - Biological Sciences
>
> http://www.montegraphia.com
>
>
> On Wed, Mar 21, 2012 at 11:42 AM, Ista Zahn  wrote:
>>
>> Hi Justin,
>>
>> this gives the correct order (8, 16, 32) on my machine:
>>
>> total.density <-
>>
>> c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32)
>> total.density <- factor(total.density, levels=c(8, 16, 32), ordered=TRUE)
>> str(total.density)
>>
>> order(levels(total.density))
>>
>> dat <- data.frame(td = total.density, v1 = rnorm(1:length(total.density)))
>>
>> ggplot(dat, aes(x = v1)) +
>>  geom_density() +
>>  facet_wrap(~td)
>>
>> Does it work for you? If yes, then you need to tell us what you're
>> doing that is different from this example. If no, please give use the
>> output of sessionInfo().
>>
>> best,
>> Ista
>>
>> On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano 
>> wrote:
>> > I think I understand, but I believe my original interest is in the order
>> > of
>> > levels(total.density), since ggplot appears to be using that to order
>> > the
>> > facets.  Thus, I'm still getting three graphs, ordered (and displayed
>> > as)
>> > 16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32.  I'm sorry
>> > if
>> > I wasn't clear and/or I've missed your message.
>> > -
>> > Justin Montemarano
>> > Graduate Student
>> > Kent State University - Biological Sciences
>> >
>> > http://www.montegraphia.com
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] list of matrices

2012-03-21 Thread R. Michael Weylandt
You probably want

mylist[[i]] = function(...)

in your loop.

Similarly, when you want them again later, you need to use double
square brackets.

I once heard a useful metaphor for understanding the difference
between [ and [[ when it comes to lists.

If x (a list) is a train, then x[2] is the second car of that train,
while x[[2]] are the contents of that second car.

That's why things like, x[1:3] are well defined (the sublist
containing elements 1 through 3) while x[[1:3]] are not.

Hope this helps,
Michael

On Wed, Mar 21, 2012 at 11:08 AM, David Zastrau  wrote:
> Hello dear R-users,
>
> I am currently trying to fill some datasatructure (array, list, hash...)
> with matrices that are calculated by a function and do vary in size:
>
> mylist = list()
> for(i in 1:n)
>    mylist[i] = function(...) # returns a matrix
>
> print(mylist[1]) # prints only the first element of the matrix
>
>
> How can I store and afterwards access the matrices that are calculated by my
> function?
>
> Any help would be greatly appreciated!
>
> Best Wishes
> David
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "cannot change working directory"

2012-03-21 Thread R. Michael Weylandt
Scratch that -- while true, it doesn't seem to cause that error message.

Can you do this?

list.dirs("/Users/user/Desktop/Recorded results/")

Also, are you sure you have various access permissions to the Wednesday folder?

Michael

On Wed, Mar 21, 2012 at 12:25 PM, R. Michael Weylandt
 wrote:
> If you are on a mac, you don't need to escape spaces within setwd():
> e.g., on my machine,  setwd("~/Desktop/Current Semester") works just
> fine.
>
> Michael
>
> On Wed, Mar 21, 2012 at 12:08 PM, mela  wrote:
>> Hi,
>>
>> I need to write  program to run some results from an experiment. i have the
>> results saved in 2 different files, named "tuesday" and "wednesday" (both
>> located within the same file). when i wrote the program for "tuesday" i had
>> no problem running it, but when i changed the work directory to wednesday i
>> got the "cannot change working directory" message. I don't understand what's
>> different between the two.
>>
>> my first code was:
>>
>> setwd("/Users/user/Desktop/Recorded\ results/tuesday")
>>
>> and my second code is:
>>
>> setwd("/Users/user/Desktop/Recorded\ results/wednesday")
>>
>> and i copied the exact location from the terminal, so i can't have typos (i
>> am using a mac, if that makes any difference).
>>
>> any suggestions? i will be grateful for any insight.
>>
>> thanks
>>
>> mela
>>
>>
>> --
>> View this message in context: 
>> http://r.789695.n4.nabble.com/cannot-change-working-directory-tp4492812p4492812.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] list of matrices

2012-03-21 Thread Sarah Goslee
Use
mylist[[i]] = function(...)

mylist[i] and mylist[[i]] are two different things.

On Wed, Mar 21, 2012 at 11:08 AM, David Zastrau  wrote:
> Hello dear R-users,
>
> I am currently trying to fill some datasatructure (array, list, hash...)
> with matrices that are calculated by a function and do vary in size:
>
> mylist = list()
> for(i in 1:n)
>    mylist[i] = function(...) # returns a matrix
>
> print(mylist[1]) # prints only the first element of the matrix
>
>
> How can I store and afterwards access the matrices that are calculated by my
> function?
>
> Any help would be greatly appreciated!
>
> Best Wishes
> David
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm: getting the confidence interval for an Odds Ratio, when using predict()

2012-03-21 Thread Dominic Comtois
Hi again,

As a follow-up question... If I need to calculate a risk difference with a
CI, I imagine the solution would be similar? Here's how I would get the
point estimate:

x1 <- factor(rbinom(100,1,.5),levels=c(0,1))
x2 <- factor(round(runif(100,1,2)),levels=c(1,2),labels=c("cat1","cat2"))
outcome <- rbinom(100,1,.2)
model <- glm(outcome~x1+x2,family=binomial(logit))
newd <- data.frame(x1=factor(c(0,0),levels=c(0,1)),
   x2=factor(c(1,2),labels=c("cat1","cat2")),
   outcome=c(0,0))

library(boot)
risks <- inv.logit(predict(model,newd))
risk.diff <- risks[2] - risks[1]

Many thanks,

Dominic C.

2012/3/20 Dominic Comtois 

> Case solved. Thanks a lot Peter!
>
> Dominic C.
>
>
> -Message d'origine-
> De : peter dalgaard [mailto:pda...@gmail.com]
> Envoyé : 20 mars 2012 07:57
> À : Dominic Comtois
> Cc : r-help@r-project.org help
> Objet : Re: [R] glm: getting the confidence interval for an Odds Ratio,
> when
> using predict()
>
> [Oops, forgot cc. to list]
>
> On Mar 20, 2012, at 04:40 , Dominic Comtois wrote:
>
> > I apologize for the errors in the previous code. Here is a reworked
> example. It works, but I suspect problems in the se calculation. I changed,
> from the 1st prediction to the 2nd only one covariate, so that the OR's CI
> should be equal to the exponentiated variable's coefficient and ci. And we
> get something different:
>
> Yep. Classical rookie mistake: Forgot to take sqrt() in the se. I then get
>
> > se <- sqrt(contr %*% V %*% t(contr))
> >
> > # display the CI
> > exp(contr %*% coef(model) + qnorm(c(.025,.50,.975))*se)
> [1] 0.655531 1.686115 4.336918
> >
> > # the point estimate is ok, as verified with
> > exp(model$coefficients[3])
>  x2cat2
> 1.686115
> >
> > # however I we'd expect to find upper and lower bound equal # to the
> > exponentiated x2cat coefficient CI exp(confint(model))[3,]
> Waiting for profiling to be done...
>   2.5 %97.5 %
> 0.6589485 4.4331058
>
> which is as close as you can expect since the confint method is a bit more
> advanced than +/-2SE.
>
> -pd
>
>
> > x1 <- factor(rbinom(100,1,.5),levels=c(0,1))
> > x2 <-
> > factor(round(runif(100,1,2)),levels=c(1,2),labels=c("cat1","cat2"))
> > outcome <- rbinom(100,1,.2)
> >
> > model <- glm(outcome~x1+x2,family=binomial(logit))
> > newd <- data.frame(x1=factor(c(0,0),levels=c(0,1)),
> >   x2=factor(c("cat1","cat2"),levels=c("cat1","cat2")),
> >   outcome=c(1,1))
> >
> > M <- model.matrix(formula(model), data=newd) V <- vcov(model) contr <-
> > c(-1,1) %*% M se <- sqrt(contr %*% V %*% t(contr))
> >
> > # display the CI
> > exp(contr %*% coef(model) + qnorm(c(.025,.50,.975))*se)
> >
> > # the point estimate is ok, as verified with
> > exp(model$coefficients[3])
> >
> > # however I we'd expect to find upper and lower bound equal # to the
> > exponentiated x2cat coefficient CI exp(confint(model))[3,]
> >
> > Many thanks,
> >
> > Dominic C.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000
> Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] "cannot change working directory"

2012-03-21 Thread mela
Hi,

I need to write  program to run some results from an experiment. i have the
results saved in 2 different files, named "tuesday" and "wednesday" (both
located within the same file). when i wrote the program for "tuesday" i had
no problem running it, but when i changed the work directory to wednesday i
got the "cannot change working directory" message. I don't understand what's
different between the two. 

my first code was:

setwd("/Users/user/Desktop/Recorded\ results/tuesday")

and my second code is:

setwd("/Users/user/Desktop/Recorded\ results/wednesday")

and i copied the exact location from the terminal, so i can't have typos (i
am using a mac, if that makes any difference).

any suggestions? i will be grateful for any insight.

thanks

mela


--
View this message in context: 
http://r.789695.n4.nabble.com/cannot-change-working-directory-tp4492812p4492812.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] passing xlim to coord_map in ggplot2

2012-03-21 Thread z2.0
The first block of code should be reproducible.

For the second block, you need only a data.frame. I've included a few rows
from the one I'm working with.

Two required libraries: maps, ggplot2.
http://r.789695.n4.nabble.com/file/n4492267/upton_tank_trunc_nabble.csv
upton_tank_trunc_nabble.csv 

--
View this message in context: 
http://r.789695.n4.nabble.com/passing-xlim-to-coord-map-in-ggplot2-tp4490005p4492267.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Check results between two data.frame

2012-03-21 Thread Sarah Goslee
As long as == is an appropriate test for your data, why not just use
R's innate ability to handle matrices/data frames?

> x1 <- matrix(1:20, ncol=4)
> x2 <- ifelse(x1 > 18, 22, x1)
> x1
 [,1] [,2] [,3] [,4]
[1,]16   11   16
[2,]27   12   17
[3,]38   13   18
[4,]49   14   19
[5,]5   10   15   20
> x2
 [,1] [,2] [,3] [,4]
[1,]16   11   16
[2,]27   12   17
[3,]38   13   18
[4,]49   14   22
[5,]5   10   15   22
> x1 == x2
 [,1] [,2] [,3]  [,4]
[1,] TRUE TRUE TRUE  TRUE
[2,] TRUE TRUE TRUE  TRUE
[3,] TRUE TRUE TRUE  TRUE
[4,] TRUE TRUE TRUE FALSE
[5,] TRUE TRUE TRUE FALSE


On Wed, Mar 21, 2012 at 8:48 AM, HJ YAN  wrote:
> Dear R-user,
>
> I'm trying to compare two sets of results and wanted to find out which
> element in the two data frame/matrix are different.
>
> I wrote the following function and it works ok, and gives me a long list of
> "good" as outcomes.
>
>
> CHECK<-
> function (x = "file1", y = "file2")
> {
>    for (i in 1:nrow(x)) {
>        for (j in 1:ncol(x)) {
>            if (x[i, j] == y[i, j]) {
>                print("good")
>            }
>            else {
>                print("check")
>            }
>        }
>    }
> }
>
>
> However, as the two datasets I was comparing are large (400*100 roughly),
> so I would like to create a matrix to identify which ones are not same in
> the two dataframes.
>
> So I added 'CHECK_XY' in my code but  when I run it, I got 'Error in
> CHECK_XY[i, j] = c("good") : subscript out of bounds'.
>
> Could anyone help please??
>
> CHECK_1<-
> function (x = "file1", y = "file2")
> {
>    NROW <- nrow(x)
>    NCOL <- ncol(x)
>    CHECK_XY <- as.matrix(NA, NROW, NCOL)
>    for (i in 1:nrow(x)) {
>        for (j in 1:ncol(x)) {
>            if (x[i, j] == y[i, j]) {
>                CHECK_XY[i, j] = c("good")
>            }
>            else {
>                CHECK_XY[i, j] = c("check")
>            }
>        }
>    }
>    print(CHECK_XY)
> }
>
> Thanks!
> HJ

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] list of matrices

2012-03-21 Thread David Zastrau

Hello dear R-users,

I am currently trying to fill some datasatructure (array, list, hash...) 
with matrices that are calculated by a function and do vary in size:


mylist = list()
for(i in 1:n)
mylist[i] = function(...) # returns a matrix

print(mylist[1]) # prints only the first element of the matrix


How can I store and afterwards access the matrices that are calculated 
by my function?


Any help would be greatly appreciated!

Best Wishes
David

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm.fit: fitted probabilities numerically 0 or 1 occurred?

2012-03-21 Thread ufuk beyaztas
Dear Ellison, 
Many thanks for your reply.

The information you typed is clear and now I know what to do.  Your
suggestion about finding some coffee while running simulation is so good =)

Regards

-
Best regards

Ufuk
--
View this message in context: 
http://r.789695.n4.nabble.com/glm-fit-fitted-probabilities-numerically-0-or-1-occurred-tp4490722p4492436.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to decide best order for ma(x, order, centre=TRUE) in time series

2012-03-21 Thread sagarnikam123
when fitting an autoregressive time series model to the data i.e. in function
ma(x, order, centre=TRUE)
i attached file (time series data)
http://r.789695.n4.nabble.com/file/n4491858/tinku.txt tinku.txt 
how to decide best order for a good time series model
which would be valuable for arma() process



--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-decide-best-order-for-ma-x-order-centre-TRUE-in-time-series-tp4491858p4491858.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multinomial Logit data arrangement

2012-03-21 Thread sakchai
Dear all,

I am having a hard time attempting to do a multinomial logit modeling in R. 

I am trying to analyze a dataset whereby there are 3 scenarios with 4
difference choice parameters.  In particular – I am having a hard time
arranging this into a .csv format. What sorts of headings should I put in
the column headers to reflect the variables and parameters for the total
number of choice made by the respondents?

An example can be seen below:

Situation 1 Situation 2  Situation 3
Parameter 1 2818
8
Parameter 2 30200
Parameter 3 3217
3
Parameter 4 3112
2

Further more, I would like to attempt to analyses if there are any
relationships with age, education etc.. How can i arrange it in a .csv
format for R to perform the function?

I know it’s seems a trivial problem but I just cant seem to get my had
around it and its the first step to start the analysis. Any guidance would
be really helpful! Thanks. oh yes - I am an MSc student in aquaculture
investigating land use values in mangrove environments.

Sakchai McDonough


--
View this message in context: 
http://r.789695.n4.nabble.com/Multinomial-Logit-data-arrangement-tp4492173p4492173.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rJava / RCMD javareconf fails

2012-03-21 Thread st0ut717
solved   grrr

I had to set JAVA_HOME to point to the jdk directory not the parent.
This had to be done as root not user otherwise the rJava is fails.

At the end of the day its installed.  




--
View this message in context: 
http://r.789695.n4.nabble.com/rJava-RCMD-javareconf-fails-tp4488961p4492094.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anova.lm F test confusion

2012-03-21 Thread msteane
What if the model isn't nested, i.e. I want to test y=x+w vs y=x+z+v.   Is
there a valid test/method to compare these (other than comparing R squared
values)?  They are both multiple regression models.  

--
View this message in context: 
http://r.789695.n4.nabble.com/anova-lm-F-test-confusion-tp4490211p4492402.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Check results between two data.frame

2012-03-21 Thread HJ YAN
Dear R-user,

I'm trying to compare two sets of results and wanted to find out which
element in the two data frame/matrix are different.

I wrote the following function and it works ok, and gives me a long list of
"good" as outcomes.


CHECK<-
function (x = "file1", y = "file2")
{
for (i in 1:nrow(x)) {
for (j in 1:ncol(x)) {
if (x[i, j] == y[i, j]) {
print("good")
}
else {
print("check")
}
}
}
}


However, as the two datasets I was comparing are large (400*100 roughly),
so I would like to create a matrix to identify which ones are not same in
the two dataframes.

So I added 'CHECK_XY' in my code but  when I run it, I got 'Error in
CHECK_XY[i, j] = c("good") : subscript out of bounds'.

Could anyone help please??

CHECK_1<-
function (x = "file1", y = "file2")
{
NROW <- nrow(x)
NCOL <- ncol(x)
CHECK_XY <- as.matrix(NA, NROW, NCOL)
for (i in 1:nrow(x)) {
for (j in 1:ncol(x)) {
if (x[i, j] == y[i, j]) {
CHECK_XY[i, j] = c("good")
}
else {
CHECK_XY[i, j] = c("check")
}
}
}
print(CHECK_XY)
}

Thanks!
HJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Not colour but symbols

2012-03-21 Thread Komine
Hi
Thank you Bert and Thomas for your help, I did what I wanted with this code.
>test<-c(4,8,9,6,7)
>barplot(test,density =20,angle=45)

But I want to cross the lines in each bar.
Please, how to do it? 
Thank you in advance. 

--
View this message in context: 
http://r.789695.n4.nabble.com/Not-colour-but-symbols-tp4490030p4491785.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multinomial Logit data arrangement

2012-03-21 Thread sakchai
Dear all,

I am having a hard time attempting to do a multinomial logit modeling in R. 

I am trying to analyze a dataset whereby there are 3 scenarios with 4
difference choice parameters.  In particular – I am having a hard time
arranging this into a .csv format. What sorts of headings should I put in
the column headers to reflect the variables and parameters for the total
number of choice made by the respondents?

An example can be seen below:

Situation 1 Situation 2  Situation 3
Parameter 1 2818
8
Parameter 2 30200
Parameter 3 3217
3
Parameter 4 3112
2

Further more, I would like to attempt to analyses if there are any
relationships with age, education etc.. How can i arrange it in a .csv
format for R to perform the function?

I know it’s seems a trivial problem but I just cant seem to get my had
around it and its the first step to start the analysis. Any guidance would
be really helpful! Thanks. oh yes - I am an MSc student in aquaculture
investigating land use values in mangrove environments.

Sakchai McDonough


--
View this message in context: 
http://r.789695.n4.nabble.com/Multinomial-Logit-data-arrangement-tp4492175p4492175.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nlme error on dimensions in multiplication

2012-03-21 Thread Thomas Wutzler
Hello R users,

When trying to fit a nonlinear mixed model to a respiration time series,
I get the following error message:

Error in recalc.varFunc(object[[i]], conLin) :
  dims [product 30] do not match the length of object [34]
In addition: Warning message:
In conLin$Xy * varWeights(object) :
  longer object length is not a multiple of shorter object length

Below is an example that generates this message on
Windows XP
R 2.14.0
nlme 3.1-103

---
library(nlme)
#load(file="tmp/exampleNmleError.RData")#ds1, fm1, ctrl
con <- url("http://www.bgc-jena.mpg.de/~twutz/tmp/exampleNmleError.RData";)
load(file=con)  #ds1, fm1, ctrl
close(con)

ds1 <- subset(ds1, select=c("ColNoM","minT","resp"))
head(ds1)
plot( resp ~ minT, col=rainbow(length(unique(ColNoM)))[ColNoM], ds1)
lines( fitted.values(fm1) ~ minT, ds1 ) # starting values from glns fit

ds2 <- ds1
# ds2 <- unique(ds1)# does not resolve the error

tmp.fit1 <- nlme(
resp ~ exp(beta0l) + beta1 * exp(-exp(beta2l)*minT)
,fixed=list(beta0l+beta1+beta2l~1)  
,random= beta0l ~ 1 | ColNoM
, weights =varPower(fixed=0.4, form=~resp)
, start=list(fixed = coef(fm1))
#, method='REML'
,data=ds2
#,control=ctrl
)

---
The exp function in the example is used to constrain estimates of the
coefficients to positive values, when actually estimating log of the
coefficients with nlme.

When I try to debug, there find a mismatch between the dims of weights
and the dims of Matrix conLin$Xy generated by function MEdecomp, which
reduced the original 34 rows to just 6 rows.
I tried to remove double entries in my example, but this did not help.

How can I make proper use of nlme?


Yours,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] No its not working, how to calculate best p order value for arma process

2012-03-21 Thread sagarnikam123
> ts<-ts(bhavar$V1)
> decompose(ts)
Error in decompose(ts) : time series has no or less than 2 periods
> HoltWinters(ts)
Error in decompose(ts(x[1L:wind], start = start(x), frequency = f),
seasonal) : 
  time series has no or less than 2 periods

is it true that order of ar(ts) function  is p-value used for arma(p,q)
calculation
> ar(ts)

Call:
ar(x = ts)

Coefficients:
  12345678   
9   10   11  
-0.0979  -0.0380  -0.1823  -0.1096   0.1806   0.0414  -0.1439   0.1093  
0.0348  -0.1580   0.2900  

Order selected 11  sigma^2 estimated as  0.4339 

--
View this message in context: 
http://r.789695.n4.nabble.com/how-calculate-seasonal-component-cyclic-component-of-time-series-tp4491470p4491673.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Type II and III sum of squares (R and SPSS)

2012-03-21 Thread Marco Tommasi
To whom it may concern

I made some analysis with R using the command Anova. However, I found 
some problmes with the output obtained by selecting type II o type III 
sum of squares.

Briefly, I have to do a 2x3 mixed model anova, wherein the first factor 
is a between factor and the second factor is a within factor. I use the 
command Anova in the list below, because I want to obtain also the sum 
of squares of the linear and quadratic contrast between the levels of 
the within factor.




Below I report the list of commands used in R ("fattA" is the beteween 
factor and "fB" is the within factor):

 > a1b1<-c(10,9,8,7)
 > a1b2<-c(7,6,4,5)
 > a1b3<-c(3,2,3,4)
 > a2b1<-c(9,9,8,7)
 > a2b2<-c(8,7,9,7)
 > a2b3<-c(7,8,8,6)
 >
 > M3<-matrix(0,8,4)
 > M3[,1]<-cbind(a1b1,a2b1)
 > M3[,2]<-cbind(a1b2,a2b2)
 > M3[,3]<-cbind(a1b3,a2b3)
 > M3[,4]<-rep(c(1,2),each=4)
 >
 > colnames(M3)<-c("b1","b2","b3","fattA")
 >
 > M3<-as.data.frame(M3)
 >
 > require(car)
Loading required package: car
Loading required package: MASS
Loading required package: nnet
 > f5<-lm(cbind(b1,b2,b3)~fattA,data=M3)
 > a5<-Anova(f5)

 > f6<-lm(c(b1,b2,b3)~rep(fattA,3),data=M3)
 >
 >
 > fB<-factor(c(1:3))
 > d2<-data.frame(fB)
 > a6<-Anova(f5,idata=d2,idesign=~fB,type=2)

 > summary(a6,multivariate=F)

Univariate Type II Repeated-Measures ANOVA Assuming Sphericity

  SS num Df Error SS den Df   FPr(>F)
(Intercept) 1080.04  1   9.5833  6 676.200 2.133e-07 ***
fattA 26.04  1   9.5833  6  16.304  0.006819 **
fB42.33  2  10.6667 12  23.812 6.645e-05 ***
fattA:fB  20.33  2  10.6667 12  11.438  0.001660 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Mauchly Tests for Sphericity

  Test statistic p-value
fB  0.87891  0.7242
fattA:fB0.87891  0.7242


Greenhouse-Geisser and Huynh-Feldt Corrections
  for Departure from Sphericity

   GG eps Pr(>F[GG])
fB   0.89199  0.0001474 ***
fattA:fB 0.89199  0.0026452 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  HF eps Pr(>F[HF])
fB   1.2438  6.645e-05 ***
fattA:fB 1.24380.00166 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Warning message:
In summary.Anova.mlm(a6, multivariate = F) : HF eps > 1 treated as 1


I repated the anlysis by setting type III sum of squares and I obtained:

 > a6<-Anova(f5,idata=d2,idesign=~fB,type=3)
 > summary(a6,multivariate=F)

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity

 SS num Df Error SS den Df  F   Pr(>F)
(Intercept) 30.817  1   9.5833  6 19.294 0.004606 **
fattA   26.042  1   9.5833  6 16.304 0.006819 **
fB  40.133  2  10.6667 12 22.575 8.57e-05 ***
fattA:fB20.333  2  10.6667 12 11.438 0.001660 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Mauchly Tests for Sphericity

  Test statistic p-value
fB  0.87891  0.7242
fattA:fB0.87891  0.7242


Greenhouse-Geisser and Huynh-Feldt Corrections
  for Departure from Sphericity

   GG eps Pr(>F[GG])
fB   0.89199  0.0001851 ***
fattA:fB 0.89199  0.0026452 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  HF eps Pr(>F[HF])
fB   1.2438   8.57e-05 ***
fattA:fB 1.24380.00166 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Warning message:
In summary.Anova.mlm(a6, multivariate = F) : HF eps > 1 treated as 1


As you can see, the sum of squares of the within factor "fB" and of the 
intercept obtained by setting type II sum of squares are dofferent form 
those obtained by setting type III sum of squares. I repeated the 
analysis by using SPPS (type II and III) and i obtained the same sum of 
squares for both types., which I report below:

within factor and interaction
source Sum of squares (type II and III)
fB42.
fB * fattA 20.
Error(fattB)   10.6667

between factor
SourceSum of squares (type II and III)
intercept1080.041667
fattA26.0417
Error  9.58333

The most strange thing, for me, is not only that R gives different 
outputs both for type II and III sum of squares, but that the output 
obtained with type II sum of squares in R coincides with the output 
obtained with type III  of SPSS.

As I remember, with balanced design, type II and III sum of squares 
should give the same output.


Is there anybody that can help me about this point?


thank you for your kind attention.

  Marco Tommasi, Ph/D.


  Department of Neuroscience and Imaging
  "G. D'Annunzio" University of Chieti-Pescara
  Via dei Vestini 31
  66100 Chieti
  ITALY

e-mail: marco.tomm...@unich.it 
tel.: +39 0871 3555890 / fax: + 39 0871 3554163
Web site: www.dni.unich.it



[[alternative HTML version

[R] Solving ODE via deSolve and optimizing best parameter values.

2012-03-21 Thread mhimanshu
Hello Everyone,

I have just started with deSolve package, I have a set of ODE equation with
few parameters, for example
dy/dx = a1*Y*(1-Y/Ymax) - a2*(Y+0.001)
and
dz/dt = a3*Y- a4*Z;
I have the initial values of Y, Z and a1,a2,a3 are my parameters,
Now my main objective is to find the solve the equation and also to find the
best optimized values for boht Y & Z and also other parameters like
a1,a2,a3.
I am using deSolve package and i have solved the equation and now I am stuck
with the finding the best values for my parameters.
Can anyone please help me out in this ? 
you can also write to me at bioinfo.himan...@gmail.com


Thanks in Advance
Himanshu

--
View this message in context: 
http://r.789695.n4.nabble.com/Solving-ODE-via-deSolve-and-optimizing-best-parameter-values-tp4492505p4492505.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to specify order of a factor

2012-03-21 Thread Justin Montemarano
Ista:

Your attached code did work for me; moreover, the facets were presented in
the desired order with facet_wrap() and facet_grid(), which is what I'm
using because I have a second factor used in facet_grid().

Still, my plots with total.density as a facet are coming out in 16, 32, 8,
and I'm not seeing why.  Below is my plot code -

ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = per.remain))
> + facet_grid(total.density ~ prop.ec) +
> #add point and error bar data
> theme_set(theme_bw()) +
> geom_point() + geom_errorbar(aes(ymin = per.remain - se, ymax =
> per.remain + se), width = 3) +
> #add predicted model data
> geom_line(data = se.predict.data[se.predict.data$plant.sp == 'EC',],
> aes(x = x.values, y = predicted.values), colour = c('red')) +
> geom_line(data = dc.predict.data[dc.predict.data$plant.sp == 'EC',],
> aes(x = x.values, y = predicted.values), colour = c('blue'), linetype =
> c('dashed')) +
>
> xlab('Day') + ylab('Percent Mass Remaining') + opts(panel.grid.major =
> theme_blank(), panel.grid.minor = theme_blank())

Is there anything odd about it that might be producing the odd ordering
problem?  FYI, avoiding subsetting ag.tab doesn't do the trick.
-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com


On Wed, Mar 21, 2012 at 11:42 AM, Ista Zahn  wrote:

> Hi Justin,
>
> this gives the correct order (8, 16, 32) on my machine:
>
> total.density <-
>
> c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32)
> total.density <- factor(total.density, levels=c(8, 16, 32), ordered=TRUE)
> str(total.density)
>
> order(levels(total.density))
>
> dat <- data.frame(td = total.density, v1 = rnorm(1:length(total.density)))
>
> ggplot(dat, aes(x = v1)) +
>  geom_density() +
>  facet_wrap(~td)
>
> Does it work for you? If yes, then you need to tell us what you're
> doing that is different from this example. If no, please give use the
> output of sessionInfo().
>
> best,
> Ista
>
> On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano 
> wrote:
> > I think I understand, but I believe my original interest is in the order
> of
> > levels(total.density), since ggplot appears to be using that to order the
> > facets.  Thus, I'm still getting three graphs, ordered (and displayed as)
> > 16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32.  I'm sorry
> if
> > I wasn't clear and/or I've missed your message.
> > -
> > Justin Montemarano
> > Graduate Student
> > Kent State University - Biological Sciences
> >
> > http://www.montegraphia.com
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Forloop/ifelse program problem and list of dataframes

2012-03-21 Thread Ian Craig
Hello R Community,

I don't post to these things often so excuse me if I stumble on my forum
etiquette.  This is a complex problem for me, which may require two forum
entries, but I will try my best to be concise.   Also, I am a self taught
coder, so if my code is not to convention, your constructive criticism is
always welcome.

I need to split up a data frame by participant (gpsarc - factor), status
(testpo - factor), and by date (LocalDate), then sum the distance
(Dist_Bef_m - numeric) records for each date and average them across each
status state for each participant.  Each participant has several records
for each date and for at least 1 of 3 different possible status types (max
3).   In the end, I want a table with participant number and the status
state as my column headings with means for each status state under their
appropriate heading (see example below).  I am confident I made this way
more complicated than it needs to be.  I really appreciate any help you can
offer.

Here is my relevant coding so far:

s1 <- split(data[,c(4,10,20,42)], data$gpsarc)
  for(i in 1:length(s1))
s1[[i]] <- split(s1[[i]],s1[[i]]$testpo)
  s2 <- vector("list", length(s1))
  for(i in 1:length(s2))
   s2[[i]] <- ldply(s1[[i]],
function(x)
 {
  if(nrow(x) == 0)    if one status state does not exist, but still
accounted in the split sublist because its a factor, I would get an error,
so I added this If/Else portion to remove those entries with no records
   {
remove(x)
   }
  else
   {
by(x, x[["LocalDate"]],
 function(x1)
{
 sum(x1[["Dist_Bef_m"]])
 })
 }
  })

s3 <- vector("list", length(s2))
  for(i in 1:length(s3))
  s3[[i]] <- data.frame(mean = apply(s2[[i]][,-1],1,mean,na.rm=TRUE),
row.names = as.character(s2[[i]][,1]))

here is a sample of the s3 result:

[[1]]
 mean
2 12533.2

[[2]]
  mean
2 26300.96
3 25313.93

[[3]]
  mean
1 48489.15
3 27398.23

[[4]]
  mean
1 34783.97

[[5]]
  mean
1 21293.19
2 21962.41
3 18272.67

# I really want it to look like this:

 ppt 1   2 3
1  NA   12533.2 NA
2  NA   26300.96   25313.93
3  48489.15NA  27398.23
4  34783.97NA  NA
5  21293.1921962.41   18272.67

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to specify order of a factor

2012-03-21 Thread Ista Zahn
Hi Justin,

this gives the correct order (8, 16, 32) on my machine:

total.density <-
c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32)
total.density <- factor(total.density, levels=c(8, 16, 32), ordered=TRUE)
str(total.density)

order(levels(total.density))

dat <- data.frame(td = total.density, v1 = rnorm(1:length(total.density)))

ggplot(dat, aes(x = v1)) +
  geom_density() +
  facet_wrap(~td)

Does it work for you? If yes, then you need to tell us what you're
doing that is different from this example. If no, please give use the
output of sessionInfo().

best,
Ista

On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano  wrote:
> I think I understand, but I believe my original interest is in the order of
> levels(total.density), since ggplot appears to be using that to order the
> facets.  Thus, I'm still getting three graphs, ordered (and displayed as)
> 16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32.  I'm sorry if
> I wasn't clear and/or I've missed your message.
> -
> Justin Montemarano
> Graduate Student
> Kent State University - Biological Sciences
>
> http://www.montegraphia.com
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm.fit: fitted probabilities numerically 0 or 1 occurred?

2012-03-21 Thread Ted Harding
On 21-Mar-2012 S Ellison wrote:
>> I get the errors;
>> glm.fit: fitted probabilities numerically 0 or 1 occurred and
>> glm.fit: algorithm did not converge
>> .
>> Is there any way to to 
>> fix this problem? 
> 
> There are two separate issues.
> One is the appearance of fitted values at 0 or 1. 
> The other is the lack of convergence.
> 
> The first is usually not fatal; it means that the probabilities
> are so close to 0 or 1 that a double precision value can't
> distinguish them from 0 or 1.
> Often that's a transient condition during iteration and the
> final fitted values are inside (0,1), but final fitted values
> can also be out there if you have continuous predictor values
> a long way out;  by itself, that usually won't stop a glm.
> 
> The second is a bit more problematic. Sometimes it's just that
> you need to increase the maximum number of iterations (see the
> control= argument and ?glm.control). That is always worth a try
> - use some absurdly high number like 1000 instead of the default
> 25 and go find some coffee while it thinks about it. If that
> solves your problem then you're OK, or at least as OK as you can
> be with a data set that hard to fit.
> But if you're bootstrapping with some anomalous values it is
> also possible that some of your bootstrapped sets have too high
> a proportion of anomalies, and under those conditions it's
> possible that there could be no sensible optimum within reach. 
> 
> One way of dealing with that in a boostrap or other simulation
> context is to check the 'converged' value and if it's FALSE,
> return an NA for your statistic. But of course that is a form
> of censoring; if you have a high proportion of such instances
> you'd be on very thin ice drawing conclusions.
> 
> S Ellison

There is a further general issue. The occurence of

  glm.fit: fitted probabilities numerically 0 or 1 occurred and
  glm.fit: algorithm did not converge

is usually a symptom of what is called "linear separation".
The simplest instance of this is when there is a single
predictor variable X whose values are, say, ordered as
X[1] < X[2] < ... < X[n-1] < X[n].

If it should happen that the values Y[1] ... Y[n] of the 0/1
response variable Y are all equal to 0 for X[1],...,X[k]
and all equal to 1 for X[k+1],...,X[n], and you are fitting
a glm(Y~X,family=binomial) (which is what you use for logistic
regression), then the formula being fitted is

  log(P(Y=1|X)/(Y=0|X)) = (X - m)/s for some values of m and s.

In the circumstances described, for any value of m between

  X[k] < m < X[k+1]

the fit can force P[Y=1|X] --> 0 for X = X[1],...,X[k]
and can force P[Y=1|X] --> 1 for X = X[k+1],...,X[n] by
pushing s --> 0 (so then the first set of (X - m)/s --> -Inf
and the second set of (X - m)/s --> +Inf), thus forcing the
probabilities predicted by the model towards agreeing exactly
with the frequencies observed in the data.

In other words, any value of m satisfying X[k] < m < X[k+1]
*separates* the Y=0 responses from the Y=1 responses: all the
Y=0 responses are for X-values less than m, and all the Y=1
responses are for values of X greater than m. Since such a
value of m is not unique (any value between X[k] and X[k+1]
will do), failure of convergence will be observed.

This can certainly arise from excessively "outlying" data,
i.e. the difference X[k+1] - X[k] is much greater than the
true value of the logistic scale parameter s, so that it
is almost certain that all responses for X <= X[k] will be 0,
and all responses for X >= X[k+1] will be 1.

But it can also readily arise when there are only a few data,
i.e. n is small; and also, even with many data, if the scale
paramater is n ot large compared with the spacing between
X values. If you run the following code a few times, you
will see that "separation" occurs with about 30% probability.

  n <- 6
  X <- (1:n) ; m <- (n+1)/2 ; s <- 2
  count <- numeric(100)
  for(i in (1:100)) {
Y <- 1*(runif(n) <= exp((X-m)/s)/(1+exp((X-m)/s)))
count[i] <- (max(X[Y==0]) < min(X[Y==1]))
  }
  sum(count)

Similarly, with n <- 10, it occurs about 20% of the time;
for n <- 15 about 15%, and it never gets much below 15%
no matter how large n is.

The more general situation of "linear separation" is where
you have many predictor variables, when there can be a high
probability that the random outcomes Y=0 label one set of
cases, the Y=1 label another set of cases, and the X-vectors
for Y=0 can be separated from the X-vectors for Y=1 by a
linear form, i.e. there is a vector of coefficients beta
such that

  X[for any Y==0]%*%beta < X[for any Y==1]%*%beta

This also can arise quite readily in practice.

There is no definitive procedure for coping with this situation
when it arises. What the fitting procedure is doing is just
what it is supposed to do, namely predicting P(Y=1)=0 for
every X such that the observed Y=0, and predicting P(Y=1)=1
for every X such that the observed Y=1 whenever it can (i.e.
whenever there is separation). So indee

[R] Using extract function for dates in sqldf

2012-03-21 Thread Michael . Laviolette

I'm trying to use sqldf to query for the earliest date of a blood test when
patients have had multiple tests in a given year. My query looks like this:

test11 <- sqldf("select CHILD_ID, min(SAMP_DATE)
 from lab
 group by CHILD_ID
 having extract (year from SAMP_DATE) = 2011")

SAMP_DATE has class "date." I get the error message

Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: near "from": syntax error)


The problem seems to be in the second "from" where the "extract" function
is called. Does this need a fix or am I doing something wrong?

Thanks in advance and apologies if it turns out a simple error.

-M.L.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to specify order of a factor

2012-03-21 Thread Justin Montemarano
I think I understand, but I believe my original interest is in the order of
levels(total.density), since ggplot appears to be using that to order the
facets.  Thus, I'm still getting three graphs, ordered (and displayed as)
16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32.  I'm sorry if
I wasn't clear and/or I've missed your message.
-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to specify order of a factor

2012-03-21 Thread Sarah Goslee
Ah, you're missing something crucial:

> levels(total.density)
[1] "8"  "16" "32"

is giving you the *labels* of the factor, as *strings*, and what you
get if you use order() on them has nothing to do with the order of the
factor levels, and everything to do with the string sort order for
your locale.

> str(levels(total.density))
 chr [1:3] "8" "16" "32"

The factor levels themselves are in the order you specified.

> str(total.density)
 Ord.factor w/ 3 levels "8"<"16"<"32": 1 1 1 1 1 1 1 1 1 1 ...




On Wed, Mar 21, 2012 at 10:50 AM, Justin Montemarano  wrote:
> Actually I've try that too, Sarah
>
> The test is to run order(levels(total.density)), which I need to be 1 2 3,
> not 2 3 1, and your solution still gives me 2 3 1.
>
> I also don't know how to reply to this thread with the previous message
> below...
> -
> Justin Montemarano
> Graduate Student
> Kent State University - Biological Sciences
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] AIC models are not all fitted to the same number of observation

2012-03-21 Thread Patrick Giraudoux

Le 21/03/2012 10:56, Patrick Giraudoux a écrit :

Hi,

Using lme from the package nlme  3.1-103, I meet a strange warning. I 
am trying to compare to models with:


library(nlme)
lmez6=lme(lepus~vulpes,random=~1|troncon/an,data=ika_z6_test)
lmez60=lme(lepus~1,random=~1|troncon/an,data=ika_z6_test)

Both have the same number of observations and groups:

lmez6
Linear mixed-effects model fit by REML
  Data: ika_z6_test
  Log-restricted-likelihood: -2267.756
  Fixed: lepus ~ vulpes
(Intercept)  vulpes
 1.35017117  0.04722338

Random effects:
 Formula: ~1 | troncon
(Intercept)
StdDev:   0.8080261

 Formula: ~1 | an %in% troncon
(Intercept)  Residual
StdDev:1.086611 0.4440076

Number of Observations: 1350
Number of Groups:
troncon an %in% troncon
1691350


> lmez60
Linear mixed-effects model fit by REML
  Data: ika_z6_test
  Log-restricted-likelihood: -2266.869
  Fixed: lepus ~ 1
(Intercept)
   1.435569

Random effects:
 Formula: ~1 | troncon
(Intercept)
StdDev:   0.8139646

 Formula: ~1 | an %in% troncon
(Intercept)  Residual
StdDev:1.086843 0.4445815

Number of Observations: 1350
Number of Groups:
troncon an %in% troncon
1691350

...but when I want to compare their AIC, I get:

AIC(lmez6,lmez60)
   df  AIC
lmez6   5 4545.511
lmez60  4 4541.737
Warning message:
In AIC.default(lmez6, lmez60) :
  models are not all fitted to the same number of observations


Has anybody an explanation about this strange warning ? To what extent 
this warning may limit the conclusions that could be drawn from AIC 
comparison ?


Thanks in advance,

Patrick





Sorry to go on on the thread, I have created, but the trouble I meet is 
above my level in stats... Actually, not using AIC but an anova 
approach, I get a more informative message:


anova(lmez6, lmez60)
   Model df  AIC  BIClogLik   Test  L.Ratio p-value
lmez6  1  5 4545.511 4571.543 -2267.756
lmez60 2  4 4541.737 4562.566 -2266.869 1 vs 2 1.774036  0.1829
Warning message:
In anova.lme(lmez6, lmez60) :
  Fitted objects with different fixed effects. REML comparisons are not 
meaningful.


And fubbling a bit more, I disclosed that this was an effect of fitting 
the model using REML. If fitted using ML, things are going (apparently) 
smoothly:


lmez6=lme(lepus~vulpes,random=~1|troncon/an,data=ika_z6_test,method="ML")
> lmez60=lme(lepus~1,random=~1|troncon/an,data=ika_z6_test,method="ML")
> anova(lmez6, lmez60)
   Model df  AIC  BIClogLik   Test  L.Ratio p-value
lmez6  1  5 4536.406 4562.445 -2263.203
lmez60 2  4 4538.262 4559.093 -2265.131 1 vs 2 3.856102  0.0496

> AIC(lmez6,lmez60)
   df  AIC
lmez6   5 4536.406
lmez60  4 4538.262

Now I have the following problem. What I understood from Pinheiro and 
Bates's book and some forums, is that ML estimations are biased to some 
extent tending to underestimate variance parameters. So probably not to 
recommend however results looks consistent here.


Thus, I am lost. The two models looks to me clearly embedded (one is 
just a null model with the only intercept to estimate and the other with 
intercept + one independent variable (numeric), both have the same 
random effects, the same response variable and the same number of 
observations). Warnings, from this point of view sounds inconsistent. 
They are probably not, but beyond my understanding...


Any idea ?

Patrick

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to specify order of a factor

2012-03-21 Thread Justin Montemarano
Actually I've try that too, Sarah

The test is to run order(levels(total.density)), which I need to be 1 2 3,
not 2 3 1, and your solution still gives me 2 3 1.

I also don't know how to reply to this thread with the previous message
below...
-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to specify order of a factor

2012-03-21 Thread Sarah Goslee
Is this what you need?

> total.density <-
+ 
c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32)
> total.density <- factor(total.density, levels=c(8, 16, 32), ordered=TRUE)
> str(total.density)
 Ord.factor w/ 3 levels "8"<"16"<"32": 1 1 1 1 1 1 1 1 1 1 ...


On Wed, Mar 21, 2012 at 10:26 AM, Justin Montemarano  wrote:
> Hi all:
>
> I'm attempting to create a faceted plot with ggplot2 and I'm having issues
> with a factor's order that is used to define the facet_grid().
>
> The factor (named total.density) has three levels - 8, 16, and 32 - and I
> would like them presented in that order.  Running
> order(levels(total.density)) yields the incorrect order of the facet grid -
> 2 3 1, corresponding with 16, 32, and 8.
>
> I have attempted correcting the order with the following solutions (of
> course, not run at once):
>
> #total.density <- relevel(total.density, '8')
>        #total.density <- as.numeric(levels(total.density)[total.density])
>        #total.density <- factor(total.density, levels = c('8','16','32'))
>        #total.density <- factor(total.density, levels =
> levels(total.density)[c(3,1,2)])
>        #library(gregmisc)
>        #total.density <- reorder.factor(total.density, c('8', '16', '32'),
> order = T)
>
> The data are as follows:
>
> total.density <-
> c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32)
>
> I'm running R 2.14.2 with all packages up-to-date as of 21.3.2012.
>
> Any help would be greatly appreciated.
>
> -

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Unable to specify order of a factor

2012-03-21 Thread Justin Montemarano
Hi all:

I'm attempting to create a faceted plot with ggplot2 and I'm having issues
with a factor's order that is used to define the facet_grid().

The factor (named total.density) has three levels - 8, 16, and 32 - and I
would like them presented in that order.  Running
order(levels(total.density)) yields the incorrect order of the facet grid -
2 3 1, corresponding with 16, 32, and 8.

I have attempted correcting the order with the following solutions (of
course, not run at once):

#total.density <- relevel(total.density, '8')
#total.density <- as.numeric(levels(total.density)[total.density])
#total.density <- factor(total.density, levels = c('8','16','32'))
#total.density <- factor(total.density, levels =
levels(total.density)[c(3,1,2)])
#library(gregmisc)
#total.density <- reorder.factor(total.density, c('8', '16', '32'),
order = T)

The data are as follows:

total.density <-
c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32)

I'm running R 2.14.2 with all packages up-to-date as of 21.3.2012.

Any help would be greatly appreciated.

-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glmnet() vs. lars()

2012-03-21 Thread Patrick Breheny

On 03/21/2012 06:30 AM, Vito Muggeo (UniPa) wrote:

It appears that glmnet(), when "selecting" the covariates entering the
model, skips from K covariates, say, to K+2 or K+3. Thus 2 or 3
variables are "added" at the same time and it is not possible to obtain
a ranking of the covariates according to their importance in the model.
On the other hand lars() "adds" the covariates one at a time.
My question is: is it possible to obtain a similar output of lars (in
terms of order of the variables entering the model) using glmnet()?


glmnet() is based on an iterative coordinate descent algorithm applied 
to a grid of lambda values; LARS is a more elegant algorithm and 
computes exact solutions.  You can get your glmnet solutions to have 
higher resolution (more "exact") by using a finer grid.  In your example:



set.seed(123)
x=matrix(rnorm(100*20),100,20)
y=rnorm(100)
fit1=glmnet(x,y)
fit1$df

 [1]  0  2  4  4 ...

The default is a grid of 100 lambda values.  If we use 300 values, the 
resolution is finer and we can see the variables enter one at a time:


> fit1=glmnet(x,y,nlambda=300)
> fit1$df
  [1]  0  1  1  2  3  3  4  ...

However, it is impossible to know in advance how fine the grid must be 
in order to ensure that only one variable enters the model between any 
two consecutive lambda values.


--
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to compute the difference between two levels of a factor ?

2012-03-21 Thread S Ellison
 

> -Original Message-
> Okay, try this:
> 
>   result <- with(data,
>  aggregate(data[,-(1:2)], by=list(ID), FUN=diff))
> 
> This assumes that the dataframe is sorted as in your example. 
> If that's not the case, then use order to arrange it first:


A caveat: order and sort will return order of factor levels for factors and 
that is not necessarily alphanumeric order. The default order here is (T1, T2) 
but that need not be the case if the levels were allocated explicitly. So make 
sure that the factor _levels_ are in the order you expect!

Steve E***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] resetting console

2012-03-21 Thread Ben quant
Hello,

I'm still hoping my issue is preventable and not worthy of a bug/crash
report, hence my post is in 'help'. Anyway, I'd like to know how to reset
the console so it is clear of all residual effects caused by previous
scripts. Details:

I run a script once and it runs successfully (but very slowly). The script
uses fairly sizable data. No problem so far.

Then I run the same exact script again and the console eventually crashes.
Naturally, I thought it was the size of the data so I:

1) run script successfully (which includes a plot)

2) do this:
dev.off()
rm(list=ls(all=TRUE))
gc()

3) run script again

...and it still crashes. There isn't an R error or anything, I just get a
Microsoft error report request window and the console goes away.

However, if I:

1) run script successfully

2) shut down, and reopen console

3) run script again

...everything runs as expected and the console does not crash.

If the script runs successfully the first time and I'm clearing all
available memory (I think) what is 'remaining' that I need to reset in the
console (that restarting the console solves/clears out)?

PS - Because the script runs successfully, I'm thinking the script itself
is not all that important. I'd prefer an answer that indicates generally
how to reset the console. Basically I'm loading some data, manipulating the
data including a log transformation, doing a gam fit (family="binomial"),
and finally I plot the data from the fit. Interestingly, if I set family to
"gaussian" or if I do not log transform the data, the console does not
crash. Or should I post a crash/bug?

Regards,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to do 2SLS in R

2012-03-21 Thread Priya . Saha
thanks Arne and Achim 
will surely use the forum on systemfit at RForge for further clarification

regrads
Priya
.











 



From:   Arne Henningsen 
To: Achim Zeileis , priya.s...@zycus.com, 
r-help@r-project.org
Date:   21-03-2012 18:35
Subject:Re: [R] How to do 2SLS in R



On 21 March 2012 09:32, Achim Zeileis  wrote:
> On Wed, 21 Mar 2012, priya.s...@zycus.com wrote:
>> Hi List
>> I want to carry out structural mode. Following Example l have taken 
from
>> Basic Econometrics- Damodar Gujarati :
>
> Look at the "systemfit" package. For an introduction see the 
accompanying
> JSS paper: http://www.jstatsoft.org/v23/i04/

@Achim: Thanks for referring to the systemfit package and the
corresponding JSS paper.

@Priya: The command "tsls" (package "sem") can only estimate
single-equation models, while systemfit can also estimate
multiple-equation models. I hope that the documentation of systemfit
as well as the JSS paper are sufficiently clear. If you still have
questions regarding systemfit, please use a forum or "tracker" on
systemfit's R-Forge site (see http://systemfit.org).

Best wishes,
Arne

-- 
Arne Henningsen
http://www.arne-henningsen.name




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reshaping data from long to wide without a "timevar"

2012-03-21 Thread Paul Miller
Hello All,

Tried some more Internet searches and came to the conclusion that one probably 
does need to create a "timevar" before reshaping from long to wide. Below is 
some code that creates the "timevar" and transposes the data. 

connection <- textConnection("
005 1 Gemcitabine
005 2 Erlotinib
006 1 Gemcitabine
006 3 Erlotinib
006 2 Paclitaxel
009 1 Gemcitabine
009 2 Erlotinib
010 1 Gemcitabine
010 2 Erlotinib
010 3 Herceptin
")

TestData <- data.frame(scan(connection, list(Subject = 0, RowNo = 0, Drug = 
"")))
TestData$Drug <- as.character(TestData$Drug)
TestData$Drug <- with(TestData, ifelse(Drug == "Gemcitabine", "  Gemcitabine", 
Drug))
TestData$Drug <- with(TestData, ifelse(Drug == "Erlotinib", " Erlotinib", Drug))
TestData <- with(TestData, TestData[order(Subject,Drug), c("Subject", "Drug")])

require(reshape)
Qualifying_Regimen <- TestData
Qualifying_Regimen$Ordvar <- with(TestData, ave(1:nrow(Qualifying_Regimen), 
Subject, FUN = seq_along)) 
Qualifying_Regimen <- reshape(Qualifying_Regimen, direction="wide", 
idvar="Subject", timevar="Ordvar", v.names="Drug")

TestData
Qualifying_Regimen
 
The code for creating the timevar came from a response to an earlier post by 
Joshua Wiley which can be found at: 

http://tolstoy.newcastle.edu.au/R/e15/help/11/07/0387.html

All in all this works pretty well, and making the "timevar" is easy once you 
know how. If the gods are listening though, it would be nice if reshape could 
transpose from long to wide based solely on the order in which observations 
occur in the data. (Assuming of course that it can't do so already.)

Time for a small confession. I've referred in my posts to alphabetical sorting 
of my Drug column and to wanting the transposed columns to be in alphabetical 
order. In my actual data, I added two spaces before the drug "Gemcitabine" and 
one space before the drug "Erlotinib" so that these drugs would always come 
first and second in the sort order. Unfortunately, I neglected to mention I had 
done this and as a result must have caused people some confusion. My apologies 
for this oversight.

Thanks,

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to do 2SLS in R

2012-03-21 Thread Arne Henningsen
On 21 March 2012 09:32, Achim Zeileis  wrote:
> On Wed, 21 Mar 2012, priya.s...@zycus.com wrote:
>> Hi List
>> I want to carry out structural mode. Following Example l have taken from
>> Basic Econometrics- Damodar Gujarati :
>
> Look at the "systemfit" package. For an introduction see the accompanying
> JSS paper: http://www.jstatsoft.org/v23/i04/

@Achim: Thanks for referring to the systemfit package and the
corresponding JSS paper.

@Priya: The command "tsls" (package "sem") can only estimate
single-equation models, while systemfit can also estimate
multiple-equation models. I hope that the documentation of systemfit
as well as the JSS paper are sufficiently clear. If you still have
questions regarding systemfit, please use a forum or "tracker" on
systemfit's R-Forge site (see http://systemfit.org).

Best wishes,
Arne

-- 
Arne Henningsen
http://www.arne-henningsen.name

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in fitdist- mle failed to estimate parameters

2012-03-21 Thread Ben Bolker
vinod1  gmail.com> writes:

> I am trying fit certain data into Beta distribution. I get the error saying
> "Error in fitdist(discrete_random_variable_c, "beta", start = NULL, fix.arg
> = NULL) :   the function mle failed to estimate the parameters,   with the
> error code 100"
> 
> Below is the sorted data that I am trying to fit. Where am I going wrong.
> 
> Thanks a lot for any help.
> 
> Vinod

 Works for me.  In the future, please (1) put the data in a more
convenient format (dput() works nicely) and (2) let us know what
non-base packages you're using (I had to guess at fitdistrplus)

## x <- scan("betatmp.dat")
## dput(x,"")
x <- c(2.05495e-05, 3.68772e-05, 4.21994e-05, 4.38481e-05, 5.55001e-05, 
5.74267e-05, 6.27489e-05, 6.43976e-05, 6.64938e-05, 7.40247e-05, 
7.60495e-05, 7.90767e-05, 8.07253e-05, 8.60475e-05, 8.70433e-05, 
9.23773e-05, 9.45742e-05, 9.76995e-05, 9.93482e-05, 9.96262e-05, 
0.000101275, 0.000103371, 0.000106597, 0.000108693, 0.000110342, 
0.000110902, 0.000112927, 0.000116224, 0.000118249, 0.000119346, 
0.000119898, 0.000120773, 0.000121994, 0.000122925, 0.000123921, 
0.000131451, 0.000134577, 0.000136225, 0.000136774, 0.000138422, 
0.000139895, 0.000140519, 0.000141322, 0.000142543, 0.000150074, 
0.00015475, 0.000155126, 0.000156223, 0.000156775, 0.00015765, 
0.000161545, 0.000163194, 0.000164621, 0.000165842, 0.00016612, 
0.000166402, 0.000167769, 0.000173373, 0.000173651, 0.000174846, 
0.000176273, 0.000177396, 0.0001782, 0.000179421, 0.000183522, 
0.000183744, 0.000186952, 0.000187267, 0.000192274, 0.000193371, 
0.000197945, 0.000202719, 0.000203267, 0.000207816, 0.00021025, 
0.00021315, 0.00021392, 0.000218694, 0.00022162, 0.000230248, 
0.0002308, 0.000231675, 0.000240145, 0.000244693, 0.000252224, 
0.000266343, 0.000308765, 0.000422837, 0.000429537, 0.000443386
)
library(fitdistrplus)
fitdist(x,"beta",start=NULL)
 estimate   Std. Error
shape1 4.288030.5101352
shape2 27534.67757 3373.9026660
hist(x,col="gray",freq=FALSE,breaks=20)
curve(dbeta(x,4.288,27534),add=TRUE,col=2)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to compute the difference between two levels of a factor ?

2012-03-21 Thread Peter Ehlers

Here's the plyr way I should have thought of earlier:

 require(plyr)
 ddply(data, "ID", numcolwise(diff))

Still requires your data to be ordered.

Peter Ehlers

On 2012-03-21 04:51, Eik Vettorazzi wrote:

Hi Sylvain,

assuming your data frame is ordered by ID and TIME, how about this
aggregate(cbind(X,Y)~ID,data, function(x)(x[2]-x[1]))

#or doing this for all but the first 2 columns of data:
aggregate(data[,-(1:2)],by=list(data$ID), function(x)(x[2]-x[1]))

cheers.

Am 21.03.2012 09:48, schrieb wphantomfr:

Dear R-help Members,


I am wondering if anyone think of the optimal way of computing for
several numeric variable the difference between 2 levels of a factor.


To be clear let's generate a simple data frame with 2 numeric variables
collected for different subjects (ID) and 2 levels of a TIME factor
(time of evaluation)

data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9))


   ID TIME X Y
1 AA   T1  9.959540 11.140529
2 AA   T2 12.949522  9.896559
3 BB   T1  9.039486 13.469104
4 BB   T2 10.056392 14.632169
5 CC   T1  8.706590 14.939197
6 CC   T2 10.799296 10.747609

I want to compute for each subject and each variable (X, Y, ...) the
difference between T2 and T1.

Until today I do it by reshaping my dataframe to the wide format (the
columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then  compute the
difference between successive  columns one by one :
data$Xdiff=data$X.T2-data$X.T1
data$Ydiff=data$Y.T2-data$Y.T1
...

but this way is probably not optimal if the difference has to be
computed for a large number of variables.

How will you handle it ?


Thanks in advance

Sylvain Clément



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Graphic legend with mathematical symbol, numeric variable and character variable

2012-03-21 Thread Peter Ehlers
On 2012-03-20 06:09, "ECOTIÈRE David (Responsable d'activité) - CETE 
Est/LRPC de Strasbourg/6 Acoustique" wrote:

Hi,

I'd like to make a legend with a mix of mathematical symbol (tau),
numeric variable and character variables.I have tried :


types<-c("Type 1","Type 2","Type 2")
tau<-c(1,3,2)
legend(x="topright",legend=paste(types,"tau=",expression(tau)))



but it doesn't work: the 'tau' symbol is not written in its 'symbol
style' but as 'tau'

Any (good) idea ?
Thank you in advance !

David


Does this do what you want:

  types <- paste('Type', 1:3, sep='~')
  mytau <- 4:6
  leg <- parse(text=paste(types, "~~tau==", mytau, sep='~'))
  plot(0)
  legend('topright', legend=leg)

The '~' symbols generate spaces; use more if you want more spacing.


Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to compute the difference between two levels of a factor ?

2012-03-21 Thread Eik Vettorazzi
Hi Sylvain,

assuming your data frame is ordered by ID and TIME, how about this
aggregate(cbind(X,Y)~ID,data, function(x)(x[2]-x[1]))

#or doing this for all but the first 2 columns of data:
aggregate(data[,-(1:2)],by=list(data$ID), function(x)(x[2]-x[1]))

cheers.

Am 21.03.2012 09:48, schrieb wphantomfr:
> Dear R-help Members,
> 
> 
> I am wondering if anyone think of the optimal way of computing for
> several numeric variable the difference between 2 levels of a factor.
> 
> 
> To be clear let's generate a simple data frame with 2 numeric variables 
> collected for different subjects (ID) and 2 levels of a TIME factor
> (time of evaluation)
> 
> data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9))
> 
> 
>   ID TIME X Y
> 1 AA   T1  9.959540 11.140529
> 2 AA   T2 12.949522  9.896559
> 3 BB   T1  9.039486 13.469104
> 4 BB   T2 10.056392 14.632169
> 5 CC   T1  8.706590 14.939197
> 6 CC   T2 10.799296 10.747609
> 
> I want to compute for each subject and each variable (X, Y, ...) the
> difference between T2 and T1.
> 
> Until today I do it by reshaping my dataframe to the wide format (the
> columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then  compute the
> difference between successive  columns one by one :
> data$Xdiff=data$X.T2-data$X.T1
> data$Ydiff=data$Y.T2-data$Y.T1
> ...
> 
> but this way is probably not optimal if the difference has to be
> computed for a large number of variables.
> 
> How will you handle it ?
> 
> 
> Thanks in advance
> 
> Sylvain Clément
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Eik Vettorazzi

Department of Medical Biometry and Epidemiology
University Medical Center Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

--
Pflichtangaben gemäß Gesetz über elektronische Handelsregister und 
Genossenschaftsregister sowie das Unternehmensregister (EHUG):

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; 
Gerichtsstand: Hamburg

Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr. 
Alexander Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm.fit: fitted probabilities numerically 0 or 1 occurred?

2012-03-21 Thread S Ellison
> I get the errors;
> glm.fit: fitted probabilities numerically 0 or 1 occurred and
> glm.fit: algorithm did not converge
> .
> Is there any way to to 
> fix this problem? 

There are two separate issues.
One is the appearance of fitted values at 0 or 1. 
The other is the lack of convergence.

The first is usually not fatal; it means that the probabilities are so close to 
0 or 1 that a double precision value can't distinguish them from 0 or 1. Often 
that's a transient condition during iteration and the final fitted values are 
inside (0,1), but final fitted values can also be out there if you have 
continuous predictor values a long way out;  by itself, that usually won't stop 
a glm.

The second is a bit more problematic. Sometimes it's just that you need to 
increase the maximum number of iterations (see the control= argument and 
?glm.control). That is always worth a try - use some absurdly high number like 
1000 instead of the default 25 and go find some coffee while it thinks about 
it. If that solves your problem then you're OK, or at least as OK as you can be 
with a data set that hard to fit. 
But if you're bootstrapping with some anomalous values it is also possible that 
some of your bootstrapped sets have too high a proportion of anomalies, and 
under those conditions it's possible that there could be no sensible optimum 
within reach. 

One way of dealing with that in a boostrap or other simulation context is to 
check the 'converged' value and if it's FALSE, return an NA for your statistic. 
But of course that is a form of censoring; if you have a high proportion of 
such instances you'd be on very thin ice drawing conclusions.

S Ellison

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to compute the difference between two levels of a factor ?

2012-03-21 Thread wphantomfr

Okay, try this:

 result <- with(data,
aggregate(data[,-(1:2)], by=list(ID), FUN=diff)) 


That's it !! I didn't knew the "diff" function. Your solution works 
perfectly.



Thanks Peter for this !

Sylvain


Le 21/03/12 12:01, Peter Ehlers a écrit :

On 2012-03-21 03:37, wphantomfr wrote:

Thanks peter for your fast answer.


your is really nice but if I have say 20 variables I have to write 20
statements like "DIF.X = X[TIME=="T2"] - X[TIME=="T1"]".

Does someone has a trick to avoid this ? It may not be easily possible.


Okay, try this:

 result <- with(data,
aggregate(data[,-(1:2)], by=list(ID), FUN=diff))

This assumes that the dataframe is sorted as in your example. If
that's not the case, then use order to arrange it first:

 data <- with(data, data[order(ID, TIME), ])


Peter Ehlers



Le 21/03/12 11:03, Peter Ehlers a écrit :

On 2012-03-21 01:48, wphantomfr wrote:

Dear R-help Members,


I am wondering if anyone think of the optimal way of computing for
several numeric variable the difference between 2 levels of a factor.


To be clear let's generate a simple data frame with 2 numeric 
variables

collected for different subjects (ID) and 2 levels of a TIME factor
(time of evaluation)

data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9)) 




 ID TIME X Y
1 AA   T1  9.959540 11.140529
2 AA   T2 12.949522  9.896559
3 BB   T1  9.039486 13.469104
4 BB   T2 10.056392 14.632169
5 CC   T1  8.706590 14.939197
6 CC   T2 10.799296 10.747609

I want to compute for each subject and each variable (X, Y, ...) the
difference between T2 and T1.

Until today I do it by reshaping my dataframe to the wide format (the
columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then  compute the
difference between successive  columns one by one :
data$Xdiff=data$X.T2-data$X.T1
data$Ydiff=data$Y.T2-data$Y.T1
...

but this way is probably not optimal if the difference has to be
computed for a large number of variables.

How will you handle it ?


One way is to use the plyr package:

  library(plyr)
  result<- ddply(data, "ID", summarize,
  DIF.X = X[TIME=="T2"] - X[TIME=="T1"],
  DIF.Y = Y[TIME=="T2"] - Y[TIME=="T1"])

Peter Ehlers





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to compute the difference between two levels of a factor ?

2012-03-21 Thread Peter Ehlers

On 2012-03-21 03:37, wphantomfr wrote:

Thanks peter for your fast answer.


your is really nice but if I have say 20 variables I have to write 20
statements like "DIF.X = X[TIME=="T2"] - X[TIME=="T1"]".

Does someone has a trick to avoid this ? It may not be easily possible.


Okay, try this:

 result <- with(data,
aggregate(data[,-(1:2)], by=list(ID), FUN=diff))

This assumes that the dataframe is sorted as in your example. If
that's not the case, then use order to arrange it first:

 data <- with(data, data[order(ID, TIME), ])


Peter Ehlers



Le 21/03/12 11:03, Peter Ehlers a écrit :

On 2012-03-21 01:48, wphantomfr wrote:

Dear R-help Members,


I am wondering if anyone think of the optimal way of computing for
several numeric variable the difference between 2 levels of a factor.


To be clear let's generate a simple data frame with 2 numeric variables
collected for different subjects (ID) and 2 levels of a TIME factor
(time of evaluation)

data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9))


 ID TIME X Y
1 AA   T1  9.959540 11.140529
2 AA   T2 12.949522  9.896559
3 BB   T1  9.039486 13.469104
4 BB   T2 10.056392 14.632169
5 CC   T1  8.706590 14.939197
6 CC   T2 10.799296 10.747609

I want to compute for each subject and each variable (X, Y, ...) the
difference between T2 and T1.

Until today I do it by reshaping my dataframe to the wide format (the
columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then  compute the
difference between successive  columns one by one :
data$Xdiff=data$X.T2-data$X.T1
data$Ydiff=data$Y.T2-data$Y.T1
...

but this way is probably not optimal if the difference has to be
computed for a large number of variables.

How will you handle it ?


One way is to use the plyr package:

  library(plyr)
  result<- ddply(data, "ID", summarize,
  DIF.X = X[TIME=="T2"] - X[TIME=="T1"],
  DIF.Y = Y[TIME=="T2"] - Y[TIME=="T1"])

Peter Ehlers



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to compute the difference between two levels of a factor ?

2012-03-21 Thread wphantomfr

Thanks peter for your fast answer.


your is really nice but if I have say 20 variables I have to write 20 
statements like "DIF.X = X[TIME=="T2"] - X[TIME=="T1"]".


Does someone has a trick to avoid this ? It may not be easily possible.

Regards
Sylvain Clément





Le 21/03/12 11:03, Peter Ehlers a écrit :

On 2012-03-21 01:48, wphantomfr wrote:

Dear R-help Members,


I am wondering if anyone think of the optimal way of computing for
several numeric variable the difference between 2 levels of a factor.


To be clear let's generate a simple data frame with 2 numeric variables
collected for different subjects (ID) and 2 levels of a TIME factor
(time of evaluation)

data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9)) 



ID TIME X Y
1 AA   T1  9.959540 11.140529
2 AA   T2 12.949522  9.896559
3 BB   T1  9.039486 13.469104
4 BB   T2 10.056392 14.632169
5 CC   T1  8.706590 14.939197
6 CC   T2 10.799296 10.747609

I want to compute for each subject and each variable (X, Y, ...) the
difference between T2 and T1.

Until today I do it by reshaping my dataframe to the wide format (the
columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then  compute the
difference between successive  columns one by one :
data$Xdiff=data$X.T2-data$X.T1
data$Ydiff=data$Y.T2-data$Y.T1
...

but this way is probably not optimal if the difference has to be
computed for a large number of variables.

How will you handle it ?


One way is to use the plyr package:

 library(plyr)
 result <- ddply(data, "ID", summarize,
 DIF.X = X[TIME=="T2"] - X[TIME=="T1"],
 DIF.Y = Y[TIME=="T2"] - Y[TIME=="T1"])

Peter Ehlers




Thanks in advance

Sylvain Clément





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmnet() vs. lars()

2012-03-21 Thread Vito Muggeo (UniPa)

dear all,

It appears that glmnet(), when "selecting" the covariates entering the 
model, skips from K covariates, say, to K+2 or K+3. Thus 2 or 3 
variables are "added" at the same time and it is not possible to obtain 
a ranking of the covariates according to their importance in the model. 
On the other hand lars() "adds" the covariates one at a time.
My question is: is it possible to obtain a similar output of lars (in 
terms of order of the variables entering the model) using glmnet()?


many thanks,
vito


#Example (from ?glmnet)

set.seed(123)
x=matrix(rnorm(100*20),100,20)
y=rnorm(100)
fit1=glmnet(x,y)
fit1$df #no. of covariates entering the model at different lambdas

#Thus in the "first" model no covariate is included and in the second 
#one 2 covariates (V8 and V20) are included at the same time. Because 
#two variables are included at the same time I do not know which 
#variable (among the selected ones) is more important.

#Everything is fine with lars

o<-lars(x,y)
o$df #the covariates enter one at a time.. V8 is "better" than V20


--

Vito M.R. Muggeo
Dip.to Sc Statist e Matem `Vianelli'
Università di Palermo
viale delle Scienze, edificio 13
90128 Palermo - ITALY
tel: 091 23895240
fax: 091 485726
http://dssm.unipa.it/vmuggeo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anova.lm F test confusion

2012-03-21 Thread peter dalgaard

On Mar 21, 2012, at 09:04 , Rolf Turner wrote:

> On 21/03/12 20:19, Gerrit Eichner wrote:
>> Dear Ben, or anybody else, of course,
>> 
>> I'd be grateful if you could point me to a reference (different from ch. 4 
>> "Linear models" in "Statistical Models in S" (Chambers & Hastie (1992))) 
>> regarding the (asserted F-)distributional properties of the test statistic 
>> (used, e.g., by anova.lm()) to compare model 1 with model 2 using the MSE of 
>> model 3 in a sequence of three nested (linear) models? (A short 
>> RSiteSearch() and a google search didn't lead me far ...)
>> 
>> Thx in advance!
> 
> A good, if somewhat dry, reference on this is "Theory and Application of the
> Linear Model" by Franklin A. Graybill, Duxbury, 1976.
> 
> There are of course many, *many* other such books.

The whole thing is of course a fairly straightforward consequence of 
Fisher-Cochran's theorem. This says that, in the absence of systematic effects, 
the Sums of Squares in the ANOVA tables are proportional to independent 
chi-square variables. Hence, the ratio of any pair of Mean Squares or pooled 
Mean Squares has an F distribution. This holds for the sort of ANOVA tables 
that decompose the total sum of squares (i.e, not the drop1 or TypeII-IV style 
of table)

The convention of dividing by the "overall error" term rather than successively 
pooling terms according to model reductions is pervasive both in statistical 
literature and in software. This stems from the emphasis on balanced designs in 
the days before electronic computers: When you know that the sums of squares 
are independent of the testing order, you rather like to have the same property 
for the F tests, so that all conclusions can be conveniently read from a single 
ANOVA table. 

By and large, it doesn't make a whole lot of difference whether you gain a few 
denominator DF. It does, however, imply that the F tests with a common 
denominator are not independent, as opposed to the "proper" successive F tests.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how calculate seasonal component & cyclic component of time series?

2012-03-21 Thread Berend Hasselman

On 21-03-2012, at 08:10, sagarnikam123 wrote:

> i am new to time series,whatever i know up till now,from that
> i have uploaded time series file & what to build arma model,but for that i
> want p & q values(orders)
> tell me how to calculate best p & q values to find best AIC values for model
> 
> i am doing but giving error
>> bhavar<-read.table(file.choose())  #taking time series file

It's a file of numbers.

>> decompose(bhavar$V1)
> Error in decompose(bhavar$V1) : time series has no or less than 2 periods
> 
> what this error is telling ?


You asked a very similar question 8 days ago.
You were given answers.
If you want to use HoltWinters or decompose you will have to set the frequency 
of your time series to something > 1.
You were told how to do that.
You were also told to have a look at the forecast package.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to compute the difference between two levels of a factor ?

2012-03-21 Thread Peter Ehlers

On 2012-03-21 01:48, wphantomfr wrote:

Dear R-help Members,


I am wondering if anyone think of the optimal way of computing for
several numeric variable the difference between 2 levels of a factor.


To be clear let's generate a simple data frame with 2 numeric variables
collected for different subjects (ID) and 2 levels of a TIME factor
(time of evaluation)

data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9))

ID TIME X Y
1 AA   T1  9.959540 11.140529
2 AA   T2 12.949522  9.896559
3 BB   T1  9.039486 13.469104
4 BB   T2 10.056392 14.632169
5 CC   T1  8.706590 14.939197
6 CC   T2 10.799296 10.747609

I want to compute for each subject and each variable (X, Y, ...) the
difference between T2 and T1.

Until today I do it by reshaping my dataframe to the wide format (the
columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then  compute the
difference between successive  columns one by one :
data$Xdiff=data$X.T2-data$X.T1
data$Ydiff=data$Y.T2-data$Y.T1
...

but this way is probably not optimal if the difference has to be
computed for a large number of variables.

How will you handle it ?


One way is to use the plyr package:

 library(plyr)
 result <- ddply(data, "ID", summarize,
 DIF.X = X[TIME=="T2"] - X[TIME=="T1"],
 DIF.Y = Y[TIME=="T2"] - Y[TIME=="T1"])

Peter Ehlers




Thanks in advance

Sylvain Clément


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] AIC models are not all fitted to the same number of observation

2012-03-21 Thread Patrick Giraudoux

Hi,

Using lme from the package nlme  3.1-103, I meet a strange warning. I am 
trying to compare to models with:


library(nlme)
lmez6=lme(lepus~vulpes,random=~1|troncon/an,data=ika_z6_test)
lmez60=lme(lepus~1,random=~1|troncon/an,data=ika_z6_test)

Both have the same number of observations and groups:

lmez6
Linear mixed-effects model fit by REML
  Data: ika_z6_test
  Log-restricted-likelihood: -2267.756
  Fixed: lepus ~ vulpes
(Intercept)  vulpes
 1.35017117  0.04722338

Random effects:
 Formula: ~1 | troncon
(Intercept)
StdDev:   0.8080261

 Formula: ~1 | an %in% troncon
(Intercept)  Residual
StdDev:1.086611 0.4440076

Number of Observations: 1350
Number of Groups:
troncon an %in% troncon
1691350


> lmez60
Linear mixed-effects model fit by REML
  Data: ika_z6_test
  Log-restricted-likelihood: -2266.869
  Fixed: lepus ~ 1
(Intercept)
   1.435569

Random effects:
 Formula: ~1 | troncon
(Intercept)
StdDev:   0.8139646

 Formula: ~1 | an %in% troncon
(Intercept)  Residual
StdDev:1.086843 0.4445815

Number of Observations: 1350
Number of Groups:
troncon an %in% troncon
1691350

...but when I want to compare their AIC, I get:

AIC(lmez6,lmez60)
   df  AIC
lmez6   5 4545.511
lmez60  4 4541.737
Warning message:
In AIC.default(lmez6, lmez60) :
  models are not all fitted to the same number of observations


Has anybody an explanation about this strange warning ? To what extent 
this warning may limit the conclusions that could be drawn from AIC 
comparison ?


Thanks in advance,

Patrick

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loading Dataset into R continual issue

2012-03-21 Thread Paul Hiemstra
On 03/21/2012 06:35 AM, bobo wrote:
> Thank you. I was able to get it loaded however when I tried to run
>
> mod1<-lm(Pat2006~FHouse)
> I got
> Error in eval(expr, envir, enclos) : object 'Pat2006' not found
>
> What exactly is occurring here?
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Loading-Dataset-into-R-continual-issue-tp4486619p4491424.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
What is probably going wrong is that Pat2006 and FHouse are part of a
data.frame, as columns. If the columns are in the same data.frame, say
one called df:

mod1 <- lm(Path2006 ~ FHouse, data = df)

an alternative is to use assign to dump the columns as variables in your
workspace:

> speed
Error: object 'speed' not found
> attach(cars)
> speed
 [1]  4  4  7  7  8  9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14
14 15 15
[26] 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 24 24
24 24 25

but I am very much in favor of the first solution using "data =". Using
attach fills up your workspace with a great deal of objects. Keeping the
columns in a data.frame is also better from a design point of view:
having them in one data.frame already groups together variables
(columns) that share a common background.

In addition:

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, *reproducible code*.

good luck,
Paul

-- 
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >