[R] Uploading Google Spreadsheet data into R

2013-11-07 Thread Luca Meyer
Hello,

I am trying to upload data I have on a Google Spreadsheet within R to
perform some analysis. I regularly update such data and need to perform
data analysis in the quickiest possible way - i.e. without need to publish
the data, so I was wondering how to make work this piece of code (source
http://www.r-bloggers.com/datagrabbing-commonly-formatted-sheets-from-a-google-spreadsheet-guardian-2014-university-guide-data/)
with my dataset (see
https://docs.google.com/spreadsheet/ccc?key=0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE#gid=0
):

library(RCurl)
gsqAPI = function(key,query,gid=0){
  tmp=getURL( paste( sep="",'https://spreadsheets.google.com/tq?',
'tqx=out:csv','&tq=', curlEscape(query), '&key=', key, '&gid=', gid),
ssl.verifypeer = FALSE )
  return( read.csv( textConnection( tmp ),  stringsAsFactors=F ) )
}
handler=function(key,i){
  tmp=gsqAPI(key,"select * where B!=''", i)
  subject=sub(".Rank",'',colnames(tmp)[1])
  colnames(tmp)[1]="Subject.Rank"
  tmp$subject=subject
  tmp
}
key='0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE'
gdata=handler(key,0)

The code is currently returning  the following:

Error in `$<-.data.frame`(`*tmp*`, "subject", value = "COL1") :
  replacement has 1 row, data has 0

Thank you in advance,
Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] deSolve, unresolved namespace error -- solved

2013-11-07 Thread Thomas Petzoldt

We have been able to reproduce the reported issue on another Linux
system: Fedora 19, and the solution was quite simple:

The deSolve package must always to be loaded *before* loading the shared
library of the compiled model.

Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Underdispersion and count data

2013-11-07 Thread Achim Zeileis

On Thu, 7 Nov 2013, sv.j...@yahoo.ca wrote:


Hello,

I have count data for 4 groups, 2 of which have a large number of zeroes 
and are overdispersed, and the other 2 underdispersed with no zeroes.


Are you sure that it's really underdispersion in addition to the lack of 
zeros? It could also be that due to the missing zeros, there is less 
dispersion.


I have two questions about model fitting, which I am quite new to, and 
have been using mostly the pscl package.


1 - How do I deal with underdispersion? Almost all the published and 
online advice is regarding overdispersion, and neither the Poisson nor 
negative binomial distribution seem appropriate. The COM Poisson comes 
up sometimes as a suggestion, but it's not clear to me how I can use 
this, explain my choice of it, or what information I would report for 
publication purposes.


There are (at least) two packages on CRAN: compoisson and ComPoissonReg 
which support this.


However, I would check first whether this is really needed or maybe a 
zero-truncated Poisson model is already sufficient.


The package "countreg" on R-Forge 
(https://R-Forge.R-project.org/R/?group_id=522) has a function zerotrunc() 
which is essentially the same code that hurdle() in "pscl" uses. So it 
should be easy to use for you.


2 - For the overdispersed data with lots of zeroes, I've tried 
zero-inflated Poisson and NegBin and hurdle models, and used the Vuong 
test to compare. However, I get equal fit for two candidate models that 
produce quite different coefficient estimates for my predictor 
variables, and hence different p values. I am unsure how to proceed in 
choosing one of these models, and how I would justify one over the other 
given that the Vuong test seems not to discriminate.


Is it just zero-inflated vs. hurdle or also differences in the regressors? 
If the former: zero-inflated and hurdle models are parametrized 
differently but often lead to similar fits. But the former has a count 
part plus a zero-inlation part whereas the latter as a zero-truncated 
count part and a zero hurdle.


If the regressors are different, then it's probably a subject-matter 
decision.


hth,
Z


Thank you and any advice would be much appreciated.

Mo

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help plotting two-dimensional ADMIXTURE data in R

2013-11-07 Thread Ella Bowles
Hello,

I am trying to plot ADMIXTURE output in R. My output is two-dimensional
(two columns, 190 rows). These rows represent individuals, and groups of
them (e.g., 1-12) represent populations. What I'm hoping for is a plot with
eleven different colours, basically like a STRUCTURE plot.

#This code works, but plots with only two colours, not separating the
populations
#to show you how I'm reading in my data
tbl=read.table("filepath to file",sep=" ")

#plotting

barplot(t(as.matrix(tbl)), col=rainbow(4), xlab="Individual #",
ylab="Ancestry", border=NA)

###I have tried to amend this with the following code:

barplot(t(as.matrix(tbl[c(0:12, 13:32, 33:51, 52:69, 70:87, 88:104,
105:118, 119:137, 138:157, 158:172, 173:190), ])), col=c("darkgreen",
"darkblue", "orange", "cyan", "yellow", "chartreuse", "blue", "cyan4",
"red", "violet", "blueviolet"), xlab="Individual #", ylab="Ancestry",
border=NA)

#but this code really still says the same thing as the first, i.e., c(1:190)

###Alternatively, this code separates the colours

barplot(t(as.matrix(tbl))[1], col=c("darkgreen", "darkblue", "orange",
"cyan", "yellow", "chartreuse", "blue", "cyan4", "red", "violet",
"blueviolet") [c(rep(1,length(1:12)),rep(2,length(13:32)),
rep(3,length(33:51)), rep(4,length(52:69)), rep(5,length(70:87)),
rep(6,length(88:104)), rep(7,length(105:118)), rep(8,length(119:137)),
rep(9,length(138:157)), rep(10,length(158:172)), rep(11,length(173:190)))],
xlab="Individual #", ylab="Ancestry", border=NA)

#But this only plots in one dimension. I've tried so many permutations of
the concepts in each of these, and I just can't get it.

I've tried available code on the net, but I think my R skills aren't
advanced enough to sort out their tricks. Any suggestions would be much
appreciated.

Sincerely, Ella

-- 
Ella Bowles
PhD Candidate
Biological Sciences
University of Calgary

e-mail: ebow...@ucalgary.ca, bowl...@gmail.com
website: http://ellabowlesphd.wordpress.com/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Underdispersion and count data

2013-11-07 Thread sv.j...@yahoo.ca
Hello,

I have count data for 4 groups, 2 of which have a
 large number of zeroes and are overdispersed, and the other 2 
underdispersed with no zeroes. I have two questions about model fitting,
 which I am quite new to, and have been using mostly the pscl package.

1 - How do I deal with underdispersion? Almost all the published and online 
advice is 
regarding overdispersion, and neither the Poisson nor negative binomial 
distribution seem appropriate. The COM Poisson comes up sometimes as a 
suggestion, but it's not clear to me how I can use this, explain my choice of 
it, or what 
information I would report for publication purposes.

2 - For the overdispersed data with lots of zeroes, I've tried 
zero-inflated Poisson and NegBin and hurdle models, and used the Vuong 
test to compare. However, I get equal fit for two candidate models that 
produce quite different coefficient estimates for my predictor 
variables, and hence different p values. I am unsure how to proceed in 
choosing one of these models, and how I would justify one over the other given 
that the Vuong test seems not to discriminate.

Thank you and any advice would be much appreciated.

Mo

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding absolute viewport location in grid / lattice

2013-11-07 Thread Paul Murrell

Hi

Take a look at current.transform() (the grid.locator() function shows an 
example use)


Paul

On 11/06/13 13:35, James Price wrote:

I'm trying to do some post-plot manipulation of some lattice
graphics, in which I need to get the absolute viewport locations on
the plotting device. So for example:

library(lattice) print(xyplot(Petal.Length ~ Sepal.Length | Species,
iris, layout = c(2, 2))) trellis.focus('panel', 1, 1)

This shows the item I want to identify. I need to know, for example,
that the lower left corner of the highlighted panel is at (0.1, 0.1)
of the plotting device. Hopefully there's some grid shenanigans I can
use to do so.

To put the question into context (in case there's an alternative
way): I'm constructing PDF reports that are an amalgam of PDF
figures, constructed through LaTeX programmatically constructed in R,
and I'd like to be able to add tooltips to some of the data points on
the graphs. To do so, I'm overlaying a tikz (empty) image that uses
the pdfcomment package to add the tooltips. Hence I need to be able
to calculate where some of the points are on the screen so I can
calculate where to put the tooltip 'hit-zones'. This would be fairly
trivial using javascript / actionscript and mouseover hooks, but I'm
constrained by PDF.

Thanks, Jim Price. Strength in Numbers.

 [[alternative HTML version deleted]]

__ R-help@r-project.org
mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Dr Paul Murrell
Department of Statistics
The University of Auckland
Private Bag 92019
Auckland
New Zealand
64 9 3737599 x85392
p...@stat.auckland.ac.nz
http://www.stat.auckland.ac.nz/~paul/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Loglogistic 4-parameter model, fitting multiple curves specific estimates give NaNS, when fit alone all parameters estimated

2013-11-07 Thread Breanne Tidemann
Hello,
I am fitting curves in drc to a Log-logistic 4-parameter model.

My experiment is looking at fall and spring applications of a herbicide on
wild oat at 5 different locations.  A preliminary mixed model ANOVA showed
that timing was significant.  I am running a dose response curve for each
application timing at each location.  However, when I run them all
together, my one fall location has estimated b,c,d and e, however the
standard error for all of them is NaNS.  When I run only the fall location
on its own, there is no problem in getting standard errors for all my
parameters.

What is the difference between running it alone and running it with other
curves?  How can I fix the NaNS standard errors in the multiple curves as I
need to be able to compare ED50s.
The data for the curve alone is below.  When I run it with the rest of the
curves they are separated by biotype.  Dependent variable is square root
biomass (srbiomass) and treatment (Trt) is the independent variable.

 Location Timing biotypeno Trt Rep Trtno biomass srbiomass  Kinsella Fall 5
0 1 1 21.2 4.604346  Kinsella Fall 5 0 2 1 168.72 12.98923  Kinsella Fall 5
0 3 1 86.76 9.314505  Kinsella Fall 5 0 4 1 14.3 3.781534  Kinsella Fall 5
50 1 2 12.7 3.563706  Kinsella Fall 5 50 2 2 46.44 6.81469  Kinsella Fall 5
50 3 2 9.84 3.136877  Kinsella Fall 5 50 4 2 32.38 5.690343  Kinsella Fall 5
100 1 3 10.3 3.209361  Kinsella Fall 5 100 2 3 41.84 6.468385  Kinsella Fall
5 100 3 3 0 0  Kinsella Fall 5 100 4 3 4.32 2.078461  Kinsella Fall 5 150 1
4 3.84 1.959592  Kinsella Fall 5 150 2 4 9.34 3.056141  Kinsella Fall 5 150
3 4 25.9 5.089204  Kinsella Fall 5 150 4 4 1.46 1.208305  Kinsella Fall 5
200 1 5 6.6 2.569047  Kinsella Fall 5 200 2 5 0 0  Kinsella Fall 5 200 3 5
72.94 8.540492  Kinsella Fall 5 200 4 5 24.56 4.955805  Kinsella Fall 5 300
1 6 6.36 2.521904  Kinsella Fall 5 300 2 6 3.6 1.897367  Kinsella Fall 5 300
3 6 31.28 5.592853  Kinsella Fall 5 300 4 6 1.46 1.208305  Kinsella Fall 5
400 1 7 2.02 1.421267  Kinsella Fall 5 400 2 7 11.2 3.34664  Kinsella Fall 5
400 3 7 3.02 1.737815  Kinsella Fall 5 400 4 7 0 0

Thanks in advance for any help.
Breanne

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting Data Frames in R by multiple columns with a custom order

2013-11-07 Thread arun
If you already have the order stored in a list or so:
For example:
dat1 <- as.data.frame(mat,stringsAsFactors=FALSE)

lst1 <- list(c("OF","ON"), c("US","UK", "WW","BO","BR","CA"), 
c("P2","P3","P1"),c("S2","S1","S3"))
 dat1[] <- lapply(seq_along(lst1),function(i) 
factor(dat1[,i],levels=lst1[[i]])) 
 dat1[with(dat1,order(c4,c1,c2,c3)),]

A.K.


On Thursday, November 7, 2013 5:47 PM, arun  wrote:
Hi,

Not sure whether this helps:
dat1 <- as.data.frame(mat,stringsAsFactors=FALSE)
dat1$c4 <- factor(dat1$c4,levels=c("OF","ON"))
 dat1$c1 <- factor(dat1$c1,levels=c("US","UK","WW","BO","BR","CA"))
 dat1$c2 <- factor(dat1$c2, levels=c("P2","P3","P1"))
 dat1$c3 <- factor(dat1$c3, levels=c("S2","S1","S3"))
 dat1[with(dat1,order(c4,c1,c2,c3)),]

A.K.




Thank you guys for the help here and my apologies if this has been answered in 
the post's already somewhere which I just was not able to 
find. 

I am trying to sort a data frame by multiple 
columns.  Each column has different values of interest for which I am 
trying to sort.  I have been able to do this alphebetically in both 
ascending and decending order using the order function.  However, for 
the document I am trying to create it is crutcial that the order is not 
alphebetically.  In fact it is by a specific ordering, on each of the 
columns, which I need to specify. 

I could do this via a nasty convolution of creating sort order variables for 
each of the columns and then merging on to the data frame by the cols, then 
ordering on the new sort order cols, then 
remove them...however, I was hoping that there would be a nice and easy 
automated way to handle this in case the order changes at another time. 

so, here's an example 
c1 = c('CA', 'CA', 'CA', 'BR', 'BR', 'UK', 'UK', 'BO', 'BO', 'BO', 'WW', 'WW', 
'WW', 'US', 'US') 
c2 = c('P3', 'P2', 'P1', 'P1', 'P2', 'P2', 'P1', 'P1', 'P3', 'P2', 'P1', 'P2', 
'P3', 'P3', 'P2') 
c3 = c('S1', 'S1', 'S2', 'S2', 'S2', 'S1', 'S2', 'S1', 'S1', 'S1', 'S1', 'S2', 
'S3', 'S1', 'S1') 
c4 = c('ON', 'ON', 'OF', 'ON', 'OF', 'OF', 'OF', 'ON', 'ON', 'ON', 'OF', 'ON', 
'ON', 'ON', 'ON') 

mat = cbind(c4, c1, c2, c3) 

if we sort as usual we'd get 
"OF"    "BR"    "P2"    "S2" 
"OF"    "CA"    "P1"    "S2" 
"OF"    "UK"    "P1"    "S2" 
"OF"    "UK"    "P2"    "S1" 
"OF"    "WW"    "P1"    "S1" 
"ON"    "BO"    "P1"    "S1" 
"ON"    "BO"    "P2"    "S1" 
"ON"    "BO"    "P3"    "S1" 
"ON"    "BR"    "P1"    "S2" 
"ON"    "CA"    "P2"    "S1" 
"ON"    "CA"    "P3"    "S1" 
"ON"    "US"    "P2"    "S1" 
"ON"    "US"    "P3"    "S1" 
"ON"    "WW"    "P2"    "S2" 
"ON"    "WW"    "P3"    "S3" 


however I want OF in col 1 to come first...then in the col2 i want US, UK, WW, 
BO, BR, and then CA.  then col 3 we need P2, then P3, then P1, and finally in 
col 4 i need S2, then S1, and then finally S3.  As such 

"OF"    "UK"    "P2"    "S1" 
"OF"    "UK"    "P1"    "S2" 
"OF"    "WW"    "P1"    "S1" 
"OF"    "BR"    "P2"    "S2" 
"OF"    "CA"    "P1"    "S2" 
"ON"    "US"    "P2"    "S1" 
"ON"    "US"    "P3"    "S1" 
"ON"    "WW"    "P2"    "S2" 
"ON"    "WW"    "P3"    "S3" 
"ON"    "BO"    "P2"    "S1" 
"ON"    "BO"    "P3"    "S1" 
"ON"    "BO"    "P1"    "S1" 
"ON"    "BR"    "P1"    "S2" 
"ON"    "CA"    "P2"    "S1" 
"ON"    "CA"    "P3"    "S1" 

i've tried nesting orders in orders, the match 
function looks like it might work if it wasn't for the fact that in my 
actual data each col can have multiple records for each value and match 
only looks for the first matching casethat said, the sort order is 
unique. 

Furthermore, these are not the real data, just an example. 

anything might be able to get me further along the way than I am now. 

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Earth (MARS) package with categorical predictors

2013-11-07 Thread Chris Wilkinson
It appears to be legitimate to include multi-level categorical and continuous 
variables in defining the model for earth (e.g. y ~ cat + cont1 + cont2) but is 
it also then possible use categoricals in the predict method using the earth 
result? I tried but it returns an error which is not very informative.

Thanks

Chris
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting Data Frames in R by multiple columns with a custom order

2013-11-07 Thread arun
Hi,

Not sure whether this helps:
dat1 <- as.data.frame(mat,stringsAsFactors=FALSE)
dat1$c4 <- factor(dat1$c4,levels=c("OF","ON"))
 dat1$c1 <- factor(dat1$c1,levels=c("US","UK","WW","BO","BR","CA"))
 dat1$c2 <- factor(dat1$c2, levels=c("P2","P3","P1"))
 dat1$c3 <- factor(dat1$c3, levels=c("S2","S1","S3"))
 dat1[with(dat1,order(c4,c1,c2,c3)),]

A.K.




Thank you guys for the help here and my apologies if this has been answered in 
the post's already somewhere which I just was not able to 
find. 

I am trying to sort a data frame by multiple 
columns.  Each column has different values of interest for which I am 
trying to sort.  I have been able to do this alphebetically in both 
ascending and decending order using the order function.  However, for 
the document I am trying to create it is crutcial that the order is not 
alphebetically.  In fact it is by a specific ordering, on each of the 
columns, which I need to specify. 

I could do this via a nasty convolution of creating sort order variables for 
each of the columns and then merging on to the data frame by the cols, then 
ordering on the new sort order cols, then 
remove them...however, I was hoping that there would be a nice and easy 
automated way to handle this in case the order changes at another time. 

so, here's an example 
c1 = c('CA', 'CA', 'CA', 'BR', 'BR', 'UK', 'UK', 'BO', 'BO', 'BO', 'WW', 'WW', 
'WW', 'US', 'US') 
c2 = c('P3', 'P2', 'P1', 'P1', 'P2', 'P2', 'P1', 'P1', 'P3', 'P2', 'P1', 'P2', 
'P3', 'P3', 'P2') 
c3 = c('S1', 'S1', 'S2', 'S2', 'S2', 'S1', 'S2', 'S1', 'S1', 'S1', 'S1', 'S2', 
'S3', 'S1', 'S1') 
c4 = c('ON', 'ON', 'OF', 'ON', 'OF', 'OF', 'OF', 'ON', 'ON', 'ON', 'OF', 'ON', 
'ON', 'ON', 'ON') 

mat = cbind(c4, c1, c2, c3) 

if we sort as usual we'd get 
"OF""BR""P2""S2" 
"OF""CA""P1""S2" 
"OF""UK""P1""S2" 
"OF""UK""P2""S1" 
"OF""WW""P1""S1" 
"ON""BO""P1""S1" 
"ON""BO""P2""S1" 
"ON""BO""P3""S1" 
"ON""BR""P1""S2" 
"ON""CA""P2""S1" 
"ON""CA""P3""S1" 
"ON""US""P2""S1" 
"ON""US""P3""S1" 
"ON""WW""P2""S2" 
"ON""WW""P3""S3" 


however I want OF in col 1 to come first...then in the col2 i want US, UK, WW, 
BO, BR, and then CA.  then col 3 we need P2, then P3, then P1, and finally in 
col 4 i need S2, then S1, and then finally S3.  As such 

"OF""UK""P2""S1" 
"OF""UK""P1""S2" 
"OF""WW""P1""S1" 
"OF""BR""P2""S2" 
"OF""CA""P1""S2" 
"ON""US""P2""S1" 
"ON""US""P3""S1" 
"ON""WW""P2""S2" 
"ON""WW""P3""S3" 
"ON""BO""P2""S1" 
"ON""BO""P3""S1" 
"ON""BO""P1""S1" 
"ON""BR""P1""S2" 
"ON""CA""P2""S1" 
"ON""CA""P3""S1" 

i've tried nesting orders in orders, the match 
function looks like it might work if it wasn't for the fact that in my 
actual data each col can have multiple records for each value and match 
only looks for the first matching casethat said, the sort order is 
unique. 

Furthermore, these are not the real data, just an example. 

anything might be able to get me further along the way than I am now. 

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merging two dataframes with a condition involving variables of both dataframes

2013-11-07 Thread Collin Lynch
You might need to implement it as a nested pair of for loops using rbind.
In essence iterate over the rows in df1 and each time find the matching
row in df2.  If none is found then add the df1 row by itself to the
result.  If one is then remove it from df2 and rbind both of them.  Once
done just merge in all rows that remain in df2.

This would likely be slower than a sql-based method but is essentially the
same algorithm.

You can find advice on for-loops in R here:
http://paleocave.sciencesortof.com/2013/03/writing-a-for-loop-in-r/

Best,
Collin.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems loading xlsx

2013-11-07 Thread David Winsemius

On Nov 7, 2013, at 12:55 PM, Barrett Gady wrote:

> does anyone know how to resolve Error:  .onLoad failed in loadNamespace() for 
> ‘xlsxjars’  
> 
> I can’t seem to get this package to load on my windows 7 install of R3.02

It appears that you did not install all  the depndencies for package xlsx.

http://cran.r-project.org/web/packages/xlsx/index.html


-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prod and F90 product

2013-11-07 Thread Duncan Murdoch

On 13-11-07 6:39 PM, Filippo wrote:

Hi,
I'm having strange differences between the R function prod ad the F90
function product.
Processing the same vector C (see attachment). I get 2 different results:
prod(C) = 1.069678e-307
testProduct(C) = 0

where testProd is the following wrapping function:

testProd <- function(x) {
  return(.Fortran('testProd', as.double(x), as.double(0),
as.double(0), as.integer(length(x
}

subroutine testProd(x, p, q,  n)
  implicit none
  integer, intent (in) :: n
  double precision, intent (in) :: x(n)
  double precision, intent (out) :: p
  double precision, intent (out) :: q
  integer :: i

  p = product(x)
  q=1
  do i = 1, n
  q = q*x(i)
  end do
end subroutine testProd

I check the lowest possible number and seems to be the same for both R
and F90.
Can anyone help me understanding this behaviour?


Some intermediate results may be stored with 80 bit precision in R, 64 
bit precision in Fortran.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prod and F90 product

2013-11-07 Thread Bert Gunter
Fortune !

Bert

Sent from my iPhone -- please excuse typos.

> On Nov 7, 2013, at 3:48 PM, Rolf Turner  wrote:
> 
>> On 11/08/13 12:39, Filippo wrote:
>> Hi,
>> I'm having strange differences between the R function prod ad the F90 
>> function product.
>> Processing the same vector C (see attachment). I get 2 different results:
>> prod(C) = 1.069678e-307
>> testProduct(C) = 0
> 
> 
> 
> If you are worried about the difference between 0 and 1.069678e-307 then you 
> probably
> shouldn't be using computers.
> 
>cheers,
> 
>Rolf Turner
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Merging two dataframes with a condition involving variables of both dataframes

2013-11-07 Thread Marc Marí Dell'Olmo
Dear all,

I would like to merge two dataframes using two conditions. For example, if
I have the dataframes df1 and df2:

> (df1 <- data.frame(var1=c("a","b","d","e","g"), var2=c(25,14,53,26,84),
infodf1=c(1,1,1,1,1)))

  var1 var2 infodf1
1a   25   1
2b   14   1
3d   53   1
4e   26   1
5g   84   1

> (df2 <- data.frame(var1=c("a", "a", "c", "d", "e", "h","i"), var3=c(10,
32, 14,55,2,53,6), var4=c(40,37, 54,70,30,98,10), infodf2=c(2,2,2,2,2,2,2)))

  var1 var3 var4 infodf2
1a   10   40   2
2a   32   37   2
3c   14   54   2
4d   55   70   2
5e2   30   2
6h   53   98   2
7i6   10   2



I would like to obtain a new dataframe df3 merging df1 and df2 if var1(of
df1)==var1(of df2) and if var3(of df2)<=var2(of df1)<=var4(of df2).
Moreover, I would like to obtain a new data frame with all rows from df1
and df2 (i.e. full outer join:
http://www.w3schools.com/sql/sql_join_full.asp)

df3 should be:

  var1 var2 infodf1 var1 var3 var4 infodf2
1a   25   1a   10   40   2
2e   26   1e2   30   2
3b   14   1   NA   NA   NA  NA
4d   53   1   NA   NA   NA  NA
5g   84   1   NA   NA   NA  NA
6   NA   NA  NAa   32   37   2
7   NA   NA  NAc   14   54   2
8   NA   NA  NAd   55   70   2
9   NA   NA  NAh   53   98   2
10  NA   NA  NAi6   10   2


I cannot use "merge" because this function doesn't allow conditions.


On the other hand, I have tried to use SQL code using the package sqldf
with the follwing R syntax:

> (df3 <- sqldf("select a.*, b.* FROM df1 a, df2 b WHERE a.var1 = b.var1
AND b.var3 <= a.var2 AND a.var2 <= b.var4 ") )

  var1 var2 infodf1 var1 var3 var4 infodf2
1a   25   1a   10   40   2
2e   26   1e2   30   2



But sqldf doesn't support the use of the option: "full outer join" option.

> (df3 <- sqldf("select a.*, b.* FROM df1 a full outer join df2 b WHERE
a.var1 = b.var1 AND b.var3 <= a.var2 AND a.var2 <= b.var4 ") )

Error in sqliteExecStatement(con, statement, bind.data) :

  RS-DBI driver: (error in statement: RIGHT and FULL OUTER JOINs are not
currently supported)


Does anyone know how I can solve my problem?

thank you very much!

Marc

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prod and F90 product

2013-11-07 Thread Rolf Turner

On 11/08/13 12:39, Filippo wrote:

Hi,
I'm having strange differences between the R function prod ad the F90 
function product.

Processing the same vector C (see attachment). I get 2 different results:
prod(C) = 1.069678e-307
testProduct(C) = 0




If you are worried about the difference between 0 and 1.069678e-307 then 
you probably

shouldn't be using computers.

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] prod and F90 product

2013-11-07 Thread Filippo

Hi,
I'm having strange differences between the R function prod ad the F90 
function product.

Processing the same vector C (see attachment). I get 2 different results:
prod(C) = 1.069678e-307
testProduct(C) = 0

where testProd is the following wrapping function:

testProd <- function(x) {
return(.Fortran('testProd', as.double(x), as.double(0), 
as.double(0), as.integer(length(x

}

subroutine testProd(x, p, q,  n)
implicit none
integer, intent (in) :: n
double precision, intent (in) :: x(n)
double precision, intent (out) :: p
double precision, intent (out) :: q
integer :: i

p = product(x)
q=1
do i = 1, n
q = q*x(i)
end do
end subroutine testProd

I check the lowest possible number and seems to be the same for both R 
and F90.

Can anyone help me understanding this behaviour?
Thank you in advance
Regards,
Filippo

c(0.126667774727727, 0.126778508335206, 0.126889731437094, 0.127001445486345, 
0.127113651944097, 0.127226352279715, 0.127339547970833, 0.127453240503398, 
0.127567431371714, 0.127682122078482, 0.127797314134848, 0.127913009060448, 
0.128029208383447, 0.128145913640589, 0.128263126377241, 0.128380848147438, 
0.128499080513928, 0.128617825048222, 0.128737083330636, 0.128856856950339, 
0.128977147505404, 0.129097956602849, 0.129219285858691, 0.129341136897992, 
0.129463511354905, 0.129586410872726, 0.129709837103945, 0.12983379171029, 
0.129958276362782, 0.130083292741782, 0.130208842537045, 0.130334927447768, 
0.130461549182643, 0.130588709459909, 0.130716410007402, 0.130844652562612, 
0.130973438872731, 0.13110277069471, 0.131232649795309, 0.131363077951156, 
0.131494056948797, 0.131625588584752, 0.131757674665574, 0.131890317007896, 
0.132023517438499, 0.132157277794358, 0.132291599922703, 0.132426485681077, 
0.132561936937394, 0.132697955569994, 0.132834543467704, 0.132971702529897, 
0.133109434666551, 0.133247741798307, 0.133386625856532, 0.13352608878338, 
0.133666132531848, 0.133806759065846, 0.133947970360249, 0.134089768400969, 
0.134232155185011, 0.134375132720541, 0.134518703026945, 0.1346628681349, 
0.13480763008643, 0.134952990934981, 0.135098952745478, 0.135245517594397, 
0.13539268756983, 0.13554046477155, 0.135688851311081, 0.135837849311768, 
0.13598746090884, 0.136137688249485, 0.136288533492915, 0.13643999881044, 
0.136592086385536, 0.136744798413917, 0.136898137103608, 0.137052104675016, 
0.137206703361001, 0.137361935406953, 0.137517803070863, 0.137674308623401, 
0.137831454347986, 0.137989242540865, 0.138147675511187, 0.138306755581082, 
0.138466485085736, 0.13862686637347, 0.13878790180582, 0.138949593757609, 
0.139111944617036, 0.13927495678575, 0.139438632678933, 0.139602974725378, 
0.139767985367577, 0.139933667061797, 0.140100022278169, 0.140267053500766, 
0.140434763227692, 0.140603153971164, 0.140772228257601, 0.140941988627706, 
0.141112437636557, 0.14128357785369, 0.141455411863193, 0.141627942263791, 
0.141801171668934, 0.141975102706893, 0.142149738020846, 0.142325080268972, 
0.14250113212454, 0.142677896276008, 0.14285537542711, 0.143033572296957, 
0.143212489620126, 0.143392130146759, 0.14357249664266, 0.143753591889391, 
0.14393541868437, 0.144117979840971, 0.144301278188622, 0.144485316572906, 
0.144670097855664, 0.144855624915093, 0.14504190064585, 0.145228927959158, 
0.145416709782907, 0.145605249061759, 0.145794548757257, 0.145984611847927, 
0.146175441329391, 0.146367040214468, 0.14655941153329, 0.146752558333409, 
0.146946483679905, 0.147141190655504, 0.147336682360684, 0.147532961913792, 
0.147730032451156, 0.147927897127204, 0.148126559114576, 0.148326021604241, 
0.148526287805616, 0.148727360946687, 0.148929244274124, 0.149131941053405, 
0.149335454568934, 0.149539788124169, 0.149744945041738, 0.149950928663571, 
0.150157742351019, 0.150365389484985, 0.150573873466047, 0.150783197714589, 
0.150993365670932, 0.151204380795458, 0.151416246568747, 0.15162896649171, 
0.151842544085717, 0.152056982892734, 0.15227228647546, 0.152488458417464, 
0.152705502323319, 0.152923421818742, 0.153142220550736, 0.153361902187731, 
0.15358247041972, 0.153803928958412, 0.154026281537366, 0.154249531912146, 
0.15447368386046, 0.154698741182311, 0.154924707700148, 0.15515158725901, 
0.155379383726686, 0.155608100993858, 0.155837742974264, 0.156068313604844, 
0.156299816845904, 0.15653225668127, 0.156765637118446, 0.15662188775, 
0.157235235947601, 0.15747146247443, 0.157708645873097, 0.157946790271927, 
0.158185899823904, 0.158425978706841, 0.158667031123545, 0.158909061301989, 
0.159152073495486, 0.159396071982862, 0.159641061068628, 0.15988704508316, 
0.160134028382875, 0.160382015350409, 0.160631010394801, 0.16088101795167, 
0.161132042483405, 0.161384088479345, 0.161637160455966, 0.161891262957074, 
0.162146400553988, 0.162402577845737, 0.162659799459248, 0.162918070049544, 
0.163177394299939, 0.163437776922235, 0.163699222656923, 0.16396173627338, 
0.164225322570076, 0.164489986374778, 0.164755732544751, 0.1650

Re: [R] Error running MuMIn dredge function using glmer models

2013-11-07 Thread Ben Bolker
Martin Turcotte  gmail.com> writes:

> Dear list, 
> I am trying to use MuMIn to compare all possible mixed models 
> using the dredge function on binomial data but I
> am getting an error message that I cannot decode. This error 
> only occurs when I use glmer. When I use an lmer
> analysis on a different response variable every works great. 
> 
> Example using a simplified glmer model
> global model: 
> mod<- glmer(cbind(st$X2.REP.LIVE, st$X2.REP.DEAD) ~ 
> DOMESTICATION*GLUC + (1|PAIR), data=st,
> na.action=na.omit , family=binomial) 
> 
> The response variables are the number of survival and dead insects
> (successes and failures)
> DOMESTICATION is a 2 level factor.
> GLUC is a continuous variable.
> PAIR is coded as a factor or character (both ways fail).
> 
> This model functions correctly but when I try it with dredge()
> I get an error. 
> 
> g<- dredge(mod, beta=F, evaluate=F, rank='AIC')
> Error in sprintf(gettext(fmt, domain = domain), ...) : 
>   invalid type of argument[1]: 'symbol'
> 
> When I try with another rank the same thing happens:
> chat<- deviance(mod)/58
> g<- dredge(mod, beta=F, evaluate=F, rank='QAIC', chat=chat)
> Error in sprintf(gettext(fmt, domain = domain), ...) : 
>   invalid type of argument[1]: 'symbol'
> 
> Any suggestions would be greatly appreciated
> 
> thanks
> 
> Martin Turcotte, Ph. D.
> mart.turcotte  gmail.com

* Please send follow-ups to r-sig-mixed-mod...@r-project.org
* I think we need a reproducible example: it works OK for the
first two binomial GLMMs I tried:

library(lme4)
library(MuMIn)
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
   data = cbpp, family = binomial)
dredge(gm1)
dredge(gm1,beta=FALSE,evaluate=FALSE)

library(mlmRev)
gm2 <- glmer(use ~ urban+age+livch+(1|district), Contraception, binomial)
dredge(gm2)
dredge(gm2,beta=FALSE,evaluate=FALSE)

* It might be worth removing the 'st$' from  your response variable,
e.g.

mod<- glmer(cbind(X2.REP.LIVE, X2.REP.DEAD) ~  DOMESTICATION*GLUC + (1|PAIR),
data=st,family=binomial) 

* What are the results of traceback()?  sessionInfo() ?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problems loading xlsx

2013-11-07 Thread Barrett Gady
does anyone know how to resolve Error:  .onLoad failed in loadNamespace() for 
‘xlsxjars’  


I can’t seem to get this package to load on my windows 7 install of R3.02



Sent from Windows Mail
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R newbie - loop questions

2013-11-07 Thread Karl Schelhammer
Yep, that solves the problem and is much cleaner than what I was trying to do.

KWS

On Nov 7, 2013, at 10:01 AM, arun  wrote:

> Hi,
> 
> May be this is what you wanted:
> mat1 <- matrix(1:4,2,2)
> results.norm <- mat1/rowSums(mat1)
> 
> A.K.
> 
> 
> 
> 
> On Thursday, November 7, 2013 12:27 PM, Karl Schelhammer 
>  wrote:
> Sorry for the confusion.  results is a 2 x 2 matrix containing real positive 
> values.  The goal is to divide each element of the matrix by the sum of the 
> elements in a row and store the result in results.norm.  However, the loop 
> returns an error that I don't understand.  Any thoughts?
> 
> KWS
> 
> On Nov 7, 2013, at 8:42 AM, smartpink...@yahoo.com wrote:
> 
>> 
>> This is not clear.
>> For example, 
>> v1 <- 24 #value
>> r1 <- c(2,3) # vector with 2 elements
>> my.function(v1,r1)
>> #[1] 4.8
>> 
>> There is no reference to "results".  If it is the output of my.function(), 
>> then there is only one element.  Anyway, please show a reproducible example.
>> 
>> 
>> 
>> 
>> Greetings all, I am attempting to run the following code:
>> 
>> 
>> I recieve the following error:
>> 
>> The function runs fine when I explicitly enter the subscripts in results and
>> pass to my.function.  Can someone else see what is wrong with this approach?
>> 
>> Quoted from: 
>> http://r.789695.n4.nabble.com/R-newbie-loop-questions-tp4679975.html
>> 
>> 
>> _
>> Sent from http://r.789695.n4.nabble.com
>> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Checking datetime value

2013-11-07 Thread arun


Hi,
You can check the dataset for any patterns that are unique for the datetime 
values. 
For example:
vec1 <-  c(as.character(Sys.time()), format(Sys.time(), "%a %b %d %X %Y 
%Z"),4324343,format(Sys.time(),"%m-%d-%Y %H:%M:%S"),"aZZCRDDS")
 vec1[grepl(":",vec1)]
#[1] "2013-11-07 15:04:27" "Thu Nov 07 03:04:27 PM 2013 EST"
#[3] "11-07-2013 15:04:27"    

A.K.


On Thursday, November 7, 2013 10:12 AM, vikrant  wrote:
Hi ,
I would like to check if a column contains datetime values. (column may
contain any datetime format or just numbers or string). Request your help 




--
View this message in context: 
http://r.789695.n4.nabble.com/Checking-datetime-value-tp4679963.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonnormal Residuals and GAMs

2013-11-07 Thread Simon Wood

If you use GCV smoothness selection then, in the Gaussian case, the key
assumptions are constant variance and independence. As with linear
modelling, the normality assumption only comes in when you want to find
confidence intervals or p-values. (The GM Thm does not require normality 
btw. but I don't know if it has a penalized analogue).


With REML smoothness selection it's less clear (at least to me).

Beyond Gaussian the situation is much as it is with GLMs. The key
assumptions are independence and that the mean variance relationship is
correct. The theory of quasi-likelihood tells you that you can make
valid inference based only on specifying the mean-variance relationship
for the response, rather than the whole distribution, with the price
being a small loss of efficiency. It follows that getting the
distribution exactly right is of secondary importance.

It's also quite easy to be misled by normal qq plots of the deviance
residuals when you have low count data. For example, section 4 of
  http://opus.bath.ac.uk/27091/1/qq_gam_resub.pdf
shows a real example where the usual qq plots look awful, suggesting
massive zero inflation, but if you compute the correct reference
quantiles for the qq plot you find that there is nothing wrong and no
evidence of zero inflation.

best,
Simon

ps. in response to the follow up discussion: The default link depends on 
the family, rather than being a  gam (or glm) default. Eg the default is 
log for the Poisson, but identity for the Gaussian.



On 06/11/13 21:46, Collin Lynch wrote:

Greetings, My question is more algorithmic than prectical.  What I am
trying to determine is, are the GAM algorithms used in the mgcv package
affected by nonnormally-distributed residuals?

As I understand the theory of linear models the Gauss-Markov theorem
guarantees that least-squares regression is optimal over all unbiased
estimators iff the data meet the conditions linearity, homoscedasticity,
independence, and normally-distributed residuals.  Absent the last
requirement it is optimal but only over unbiased linear estimators.

What I am trying to determine is whether or not it is necessary to check
for normally-distributed errors in a GAM from mgcv.  I know that the
unsmoothed terms, if any, will be fitted by ordinary least-squares but I
am unsure whether the default Penalized Iteratively Reweighted Least
Squares method used in the package is also based upon this assumption or
falls under any analogue to the Gauss-Markov Theorem.

Thank you in advance for any help.

Sincrely,
Collin Lynch.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603   http://people.bath.ac.uk/sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Same code - error in one PC but not in other

2013-11-07 Thread Carl Witthoft
In the absence of any description of the computers themselves, it's hard to
say,  but there's a good chance that there is some object in your working
environment that's being accessed by your function -- and that object exists
on only one of the machines.

I would suggest you use debug() or browser()  to see what the value of HP
and HP.LUT are in each case.


Tursiops wrote
> Hello
>I am running exactly the same code on two different computers. In PC1
> of the
>the code is working fine, whereas in PC2 I get this error
>Error in HP.LUT[, 1] : incorrect number of dimensions
>HP.LUT is an object within a function computed as HP.LUT <- which(HP
> ==1,
>arr.in=TRUE), where HP is a binary matrix.
>HP.LUT is not returned for further use. I have made sure that no object
> with
>the name HP.LUT is present in either PC1 and 2, and I have used both
> RStudio
>and R 3.01.
>So I am puzzled and wonder what the reason for this might be.
>Any hint will be much appreciated.
>Thank you very much for your attention
>Juan A. Balbuena
> 
>--
> 
>Dr. Juan A. Balbuena
>Marine Zoology Unit
>Cavanilles Institute of Biodiversity and Evolutionary Biology
>University of
>Valencia
>[1]http://www.uv.es/~balbuena
>P.O. Box 22085
>[2]http://www.uv.es/cavanilles/zoomarin/index.htm
>46071 Valencia, Spain
>[3]http://cetus.uv.es/mullpardb/index.html
>e-mail: [4]

> j.a.balbuena@

> tel. +34 963 543 658fax +34 963 543 733
>
>NOTE! For shipments by EXPRESS COURIER use the following street
> address:
>C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.
>
> 
> References
> 
>1. http://www.uv.es/%7Ebalbuena
>2. http://www.uv.es/cavanilles/zoomarin/index.htm
>3. http://cetus.uv.es/mullpardb/index.html
>4. mailto:

> j.a.balbuena@

> __

> R-help@

>  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.





--
View this message in context: 
http://r.789695.n4.nabble.com/Same-code-error-in-one-PC-but-not-in-other-tp4679980p4679992.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange behaviour when subsetting a data.frame

2013-11-07 Thread Simon Hayward
So is there a way to make the subsetting behave as expected?



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Doran, Harold
Sent: 07 November 2013 16:52
To: 'Eckstädt, Elisabeth'; r-help@R-project.org
Subject: Re: [R] strange behaviour when subsetting a data.frame

Yes, but notice that

man2[3,] == .3
[1] FALSE

This is because of issues of machine precision when dealing with floating 
points, not a problem in R.  Comparisons for nearly equivalent numbers are done 
using all.equal() as shown below.

> all.equal(man2[3,], .3)
[1] TRUE




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Eckstädt, Elisabeth
Sent: Thursday, November 07, 2013 8:37 AM
To: r-help@R-project.org
Subject: [R] strange behaviour when subsetting a data.frame

Hello everyone,
I am experiencing a unfathomable benaviour of "subset" on a data.frame. This is 
a minimal reproducable example. The data.frame cosists only of one column, 
which contains 10 ascending values (from 0.1 to 1). Subsetting for 0.1 is 
working (gives me one row), subsetting for 0.3 gives me zero rows? Doing the 
same with values from 1 to 10 is working out well (as expected). 

Beimischung=seq(0.1,1,0.1)
man2=data.frame(Beimischung)
subset(man2, Beimischung==0.3)
#---> gives 0 rows
subset(man2, Beimischung==0.1)
---> gives one row (as expected)#

#also not working:

man2$Beimischung3=man2$Beimischung*10
subset(man2, Beimischung3==3)
--> gives 0 rows

Does anybody have a clue for me?
Thanks in advance
Regards
Elisabeth
___
Dipl.-Ing. Elisabeth Eckstädt
Member of Academic Staff & PhD Student
Technische Universität Dresden
Faculty of Mechanical Engineering
Institue of Power Engineering
Chair of Building Energy Systems and Heat Supply

Phone +49 351 463 34709
Fax +49 351 463 37076
Web http://tu-dresden.de/mw/iet/ew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error running MuMIn dredge function using glmer models

2013-11-07 Thread Martin Turcotte
Dear list, 
I am trying to use MuMIn to compare all possible mixed models using the dredge 
function on binomial data but I am getting an error message that I cannot 
decode. This error only occurs when I use glmer. When I use an lmer analysis on 
a different response variable every works great. 

Example using a simplified glmer model
global model: 
mod<- glmer(cbind(st$X2.REP.LIVE, st$X2.REP.DEAD) ~ DOMESTICATION*GLUC + 
(1|PAIR), data=st, na.action=na.omit , family=binomial) 

The response variables are the number of survival and dead insects (successes 
and failures)
DOMESTICATION is a 2 level factor.
GLUC is a continuous variable.
PAIR is coded as a factor or character (both ways fail).

This model functions correctly but when I try it with dredge() I get an error. 

g<- dredge(mod, beta=F, evaluate=F, rank='AIC')
Error in sprintf(gettext(fmt, domain = domain), ...) : 
  invalid type of argument[1]: 'symbol'

When I try with another rank the same thing happens:
chat<- deviance(mod)/58
g<- dredge(mod, beta=F, evaluate=F, rank='QAIC', chat=chat)
Error in sprintf(gettext(fmt, domain = domain), ...) : 
  invalid type of argument[1]: 'symbol'

Any suggestions would be greatly appreciated

thanks

Martin Turcotte, Ph. D.
mart.turco...@gmail.com





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] all combinations with replacement not ordered

2013-11-07 Thread Bert Gunter
... and actually, since u can be assumed to be of the form shown,

v <-do.call(expand.grid, split(rep(u,len),rep(u,e=len)))


should do.

-- Bert

On Thu, Nov 7, 2013 at 10:06 AM, Bert Gunter  wrote:
> Well, you can create the expand.grid data frame programmatically via:
>
> u <- 1:3
> len <- length(u)
> v <-do.call(expand.grid, split(rep(u,len),rep(seq_len(len),e=len)))
>
> And then you can use unique.array to get the unique rows after the sort:
>
> unique(t(apply(v,1,sort)))
>
> However, I agree with your sentiments. Not only does this seem
> inelegant, but it will not scale well.
>
> I would imagine a recursive approach would be more efficient -- as
> then only the sets you need would be produced and there'd be no
> sorting, etc. -- but I have neither the time nor interest to work it
> out.
>
> ... and I bet someone already has done this in some R package anyway.
>
> Cheers,
> Bert
>
> On Thu, Nov 7, 2013 at 9:04 AM, Ted Harding  wrote:
>> On 07-Nov-2013 13:38:29 Konstantin Tretiakov wrote:
>>> Hello!
>>>
>>> I need to obtain all possible combinations with replacement when
>>> order is not important.
>>> E.g. I have a population x{1,2,3}.
>>> So I can get (choose(3+3-1,3)=) 10 combinations from this population
>>> with 'size=3'.
>>> How can I get a list of all that combinations?
>>>
>>> I have tried 'expand.grid()' and managed to get only samples where
>>> order is important.
>>> 'combn()' gave me samples without replacement.
>>>
>>> Best regards,
>>> Konstantin Tretyakov.
>>
>> >From your description I infer that, from {1,2,3}, you want the result:
>>
>>   1 1 1
>>   1 1 2
>>   1 1 3
>>   1 2 2
>>   1 2 3
>>   1 3 3
>>   2 2 2
>>   2 2 3
>>   2 3 3
>>   3 3 3
>>
>> The following will do that:
>>
>> u <- c(1,2,3)
>> unique(t(unique(apply(expand.grid(u,u,u),1,sort),margin=1)))
>>
>> #  [,1] [,2] [,3]
>> # [1,]111
>> # [2,]112
>> # [3,]113
>> # [4,]122
>> # [5,]123
>> # [6,]133
>> # [7,]222
>> # [9,]233
>> #[10,]333
>>
>> There may be a simpler way!
>> Ted.
>>
>> -
>> E-Mail: (Ted Harding) 
>> Date: 07-Nov-2013  Time: 17:04:50
>> This message was sent by XFMail
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> (650) 467-7374



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

(650) 467-7374

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] AER ivreg diagnostics: question on DF of Sargan test

2013-11-07 Thread Achim Zeileis

Hélène,

thanks for spotting this! This is a bug in "AER". I had just tested the 
new diagnostics for regressions with 1 endogenous variable and hence 
never noticed the problem. But if there are > 1 endogenous variables, the 
df used in ivreg() (and hence the associated p-values) are too large.


I've fixed the problem in AER's devel-version and will release it on CRAN 
in the next days.


Thanks & best regards,
Z

On Thu, 7 Nov 2013, Hélène Huber-Yahi wrote:


Hello,
I'm new to R and I'm currently learning to use package AER, which is
extremely comprehensive and useful. I have one question related to the
diagnostics after ivreg: if I understood well, the Sargan test provided
states that the statistic should follow a Chi squared of degrees of freedom
equal to the number of excluded instruments minus one. But I read many
times that the degrees of freedom of this statistic is supposed to equal
the number of overidentifying restrictions, i.e. the number of excluded
instruments minus the number of endogenous variables tested. When comparing
with Stata results (estat overid after ivreg, same with ivreg2 output), the
statistic is the same as the one provided by R, only the p-value changes
because the distribution chosen is different. Is this command using a
different flavor of the Sargan test ? I did not find the details in the AER
pdf.
I'm using Rstudio with R 3.0.2 (Windows 7) and AER is up to date. The
output I get from R is the following, where the Sargan DF is equal to 5,
while I thought it would be equal to 6-3=3. The data comes from Verbeek's
econometrics textbook and the example replicates the one in the book.
Dependent variable is log of wage, endogenous variables are education,
experience and its square (3 of them), excluded instruments are parents'
education etc (6 of them).


ivmodel <- ivreg(lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 + south76 | daded + 
momed + libcrd14 + age76 + age762 + nearc4 + black + smsa76 + south76,+ data 
= school)> > summary(ivmodel,diagnostics=TRUE)

Call:
ivreg(formula = lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 +
   south76 | daded + momed + libcrd14 + age76 + age762 + nearc4 +
   black + smsa76 + south76, data = school)

Residuals:
Min   1Q   Median   3Q  Max
-1.63375 -0.22253  0.02403  0.24350  1.32911

Coefficients:
 Estimate Std. Error t value Pr(>|t|)
(Intercept)  4.6064811  0.1126195  40.903  < 2e-16 ***
ed76 0.0848507  0.0066061  12.844  < 2e-16 ***
exp760.0796432  0.0164406   4.844 1.34e-06 ***
exp762  -0.0020376  0.0008257  -2.468   0.0136 *
black   -0.1726723  0.0195231  -8.845  < 2e-16 ***
smsa76   0.1521693  0.0165207   9.211  < 2e-16 ***
south76 -0.1204765  0.0154904  -7.778 1.01e-14 ***

Diagnostic tests:
 df1  df2 statistic p-value
Weak instruments6 2987   965.450  <2e-16 ***
Wu-Hausman  2 2988 1.949   0.143
Sargan  5   NA 3.868   0.569
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 0.3753 on 2990 degrees of freedom
Multiple R-Squared: 0.2868, Adjusted R-squared: 0.2854
Wald test: 178.6 on 6 and 2990 DF,  p-value: < 2.2e-16


Would this be caused by the fact that I'm using 2SLS and not GMM (at least
I suppose) to estimate the IV model ? I apologize if this comes from a
misunderstanding from my part, and I thank you in advance for your help.

Best,

H. Huber

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] all combinations with replacement not ordered

2013-11-07 Thread Bert Gunter
Well, you can create the expand.grid data frame programmatically via:

u <- 1:3
len <- length(u)
v <-do.call(expand.grid, split(rep(u,len),rep(seq_len(len),e=len)))

And then you can use unique.array to get the unique rows after the sort:

unique(t(apply(v,1,sort)))

However, I agree with your sentiments. Not only does this seem
inelegant, but it will not scale well.

I would imagine a recursive approach would be more efficient -- as
then only the sets you need would be produced and there'd be no
sorting, etc. -- but I have neither the time nor interest to work it
out.

... and I bet someone already has done this in some R package anyway.

Cheers,
Bert

On Thu, Nov 7, 2013 at 9:04 AM, Ted Harding  wrote:
> On 07-Nov-2013 13:38:29 Konstantin Tretiakov wrote:
>> Hello!
>>
>> I need to obtain all possible combinations with replacement when
>> order is not important.
>> E.g. I have a population x{1,2,3}.
>> So I can get (choose(3+3-1,3)=) 10 combinations from this population
>> with 'size=3'.
>> How can I get a list of all that combinations?
>>
>> I have tried 'expand.grid()' and managed to get only samples where
>> order is important.
>> 'combn()' gave me samples without replacement.
>>
>> Best regards,
>> Konstantin Tretyakov.
>
> >From your description I infer that, from {1,2,3}, you want the result:
>
>   1 1 1
>   1 1 2
>   1 1 3
>   1 2 2
>   1 2 3
>   1 3 3
>   2 2 2
>   2 2 3
>   2 3 3
>   3 3 3
>
> The following will do that:
>
> u <- c(1,2,3)
> unique(t(unique(apply(expand.grid(u,u,u),1,sort),margin=1)))
>
> #  [,1] [,2] [,3]
> # [1,]111
> # [2,]112
> # [3,]113
> # [4,]122
> # [5,]123
> # [6,]133
> # [7,]222
> # [9,]233
> #[10,]333
>
> There may be a simpler way!
> Ted.
>
> -
> E-Mail: (Ted Harding) 
> Date: 07-Nov-2013  Time: 17:04:50
> This message was sent by XFMail
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

(650) 467-7374

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] variable standardization in manova() call

2013-11-07 Thread Sergio Fonda
2013/11/6 Michael Friendly :
> On 11/4/2013 10:45 AM, Sergio Fonda wrote:
>>
>> Hi,
>> I'm not able to get information about the following question:
>>
>> is the variables standardization a default option in manova() (stats
>> package)?
>> Or if you want to compare variables with different units or scales and
>> rather different variances, you have to previously standardize the
>> variables ?
>>
>
> If you mean the response variables, manova() does not require equal
> variances and does not standardize.
>
>
> --
> Michael Friendly Email: friendly AT yorku DOT ca
> Professor, Psychology Dept. & Chair, Quantitative Methods
> York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
> 4700 Keele StreetWeb:   http://www.datavis.ca
> Toronto, ONT  M3J 1P3 CANADA

Hi Michael,
thank you for your reply. However, your remark was not so clear to me
so I attach a short script I tried to launch. The comparison between
results got from MANOVA() call with the non-standardized and
standardized version of the same data frame, convinced me that it is
not necessary to standardize data before calling MANOVA. The only
difference you get is in the residuals values, of course.
Could you kindly confirm my conclusion?

All the best,
Sergio Fonda
__
set.seed(1234)
# non- standardized series
x1 =  rnorm(n=10, mean=50, sd=11)
x2 =  rnorm(n=10, mean=93, sd=23)
x1 =  rnorm(n=10, mean=217, sd=52)
fact= rep(1:2,20)
glob1=data.frame(cbind(x1,x2,x3,fact))
fitta1=manova(cbind(x1,x2,x3)~fact, data=glob1)
fitta1.wilks=summary(fitta1, test="Wilks")
summary.aov(fitta1)

#after standardization
x.stand=scale(glob1[,-4])
glob2=data.frame(x.stand,fact)
fitta2=manova(cbind(x1,x2,x3)~fact, data=glob2)
fitta2.wilks=summary(fitta2, test="Wilks")
summary.aov(fitta2)

-- 
Sergio Fonda
Associate Professor of Bioengineering
Department of Life Sciences
University of Modena and Reggio Emilia
Modena, Italy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding the complementary log-link to binomial() and make.link()

2013-11-07 Thread Ben Bolker
Ken Knoblauch  inserm.fr> writes:

> 
> Roland Deutsch  tuwien.ac.at> writes:
> > in my research I frequently work with binomial 
> response models, which 
> > are of course part of the generalized linear 
> models. While I do use 
> > common link functions such as the logit, probit 
> and cloglog, I often 
> > have the need of invoking the lesser-known 
> Complementary Log link 
> > (Walter W. Piegorsch, 1992, "Complementary Log 
> Regression for 
> > Generalized Linear Models ", The American 
> Statistician , Vol. 46, No. 2, 
> > pp. 94-99 ) which is based on the exponential 
> distribution.
> > 
> > Before the release of R 3.0, I simply could 
> do so by adding the 
> > following lines to the long switch command 
> in make.link(...):
> > 
> > clog = {
> >  linkfun <- function(mu) qexp(mu)
> >  linkinv <- function(eta) pmax(.Machine$double.eps,pexp(eta))
> >  mu.eta <- function(eta) pmax(dexp(eta), .Machine$double.eps)
> >  valideta <- function(eta) all(eta > 0)
> >  },
> > 
> > and then add "clog" to the okLinks vector in 
> binomial(). However, I 
> 
> I wouldn't mess with make.link.  Why don't you just
> define your own custom link, for example following the
> example under help(family)?  This is what I did in
> the psyphy package and I just checked and
> they all still work under the current version of R.
> 
> > Thanks a lot,
> > Roland Deutsch
> 

This seems to work:

linkinfo <- list(link="clog",
 linkfun=function(mu) qexp(mu),
 linkinv=function(eta) pmax(.Machine$double.eps,pexp(eta)),
 mu.eta=function(eta) pmax(dexp(eta), .Machine$double.eps),
 valideta=function(eta) all(eta > 0))
binomClog <- binomial()
for (i in names(linkinfo))
binomClog[[i]] <- linkinfo[[i]]

set.seed(101)
d <- data.frame(x=runif(1000))
d$y <- rbinom(1000,prob=binomClog$linkinv(1+2*d$x),size=1)
library(lattice)
xyplot(y~x,data=d,type=c("p","smooth"))
g1 <- glm(y~x,data=d,family=binomClog)
library(MASS)
confint(g1)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] all combinations with replacement not ordered

2013-11-07 Thread Ted Harding
On 07-Nov-2013 13:38:29 Konstantin Tretiakov wrote:
> Hello!
> 
> I need to obtain all possible combinations with replacement when
> order is not important.
> E.g. I have a population x{1,2,3}.
> So I can get (choose(3+3-1,3)=) 10 combinations from this population
> with 'size=3'.
> How can I get a list of all that combinations?
> 
> I have tried 'expand.grid()' and managed to get only samples where
> order is important.
> 'combn()' gave me samples without replacement.
> 
> Best regards,
> Konstantin Tretyakov.

>From your description I infer that, from {1,2,3}, you want the result:

  1 1 1
  1 1 2
  1 1 3
  1 2 2
  1 2 3
  1 3 3
  2 2 2
  2 2 3
  2 3 3
  3 3 3

The following will do that:

u <- c(1,2,3)
unique(t(unique(apply(expand.grid(u,u,u),1,sort),margin=1)))

#  [,1] [,2] [,3]
# [1,]111
# [2,]112
# [3,]113
# [4,]122
# [5,]123
# [6,]133
# [7,]222
# [9,]233
#[10,]333

There may be a simpler way!
Ted.

-
E-Mail: (Ted Harding) 
Date: 07-Nov-2013  Time: 17:04:50
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange behaviour when subsetting a data.frame

2013-11-07 Thread Doran, Harold
Yes, what I would do is write a function around the all.equal() comparisons on 
the data to see which values are "nearly" identical to the ones desired I the 
subset. If TRUE, I would retain those values and discard the others.


-Original Message-
From: Simon Hayward [mailto:simon.hayw...@infinitycloud.com] 
Sent: Thursday, November 07, 2013 11:59 AM
To: Doran, Harold; 'Eckstädt, Elisabeth'; r-help@R-project.org
Subject: RE: strange behaviour when subsetting a data.frame

So is there a way to make the subsetting behave as expected?



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Doran, Harold
Sent: 07 November 2013 16:52
To: 'Eckstädt, Elisabeth'; r-help@R-project.org
Subject: Re: [R] strange behaviour when subsetting a data.frame

Yes, but notice that

man2[3,] == .3
[1] FALSE

This is because of issues of machine precision when dealing with floating 
points, not a problem in R.  Comparisons for nearly equivalent numbers are done 
using all.equal() as shown below.

> all.equal(man2[3,], .3)
[1] TRUE




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Eckstädt, Elisabeth
Sent: Thursday, November 07, 2013 8:37 AM
To: r-help@R-project.org
Subject: [R] strange behaviour when subsetting a data.frame

Hello everyone,
I am experiencing a unfathomable benaviour of "subset" on a data.frame. This is 
a minimal reproducable example. The data.frame cosists only of one column, 
which contains 10 ascending values (from 0.1 to 1). Subsetting for 0.1 is 
working (gives me one row), subsetting for 0.3 gives me zero rows? Doing the 
same with values from 1 to 10 is working out well (as expected). 

Beimischung=seq(0.1,1,0.1)
man2=data.frame(Beimischung)
subset(man2, Beimischung==0.3)
#---> gives 0 rows
subset(man2, Beimischung==0.1)
---> gives one row (as expected)#

#also not working:

man2$Beimischung3=man2$Beimischung*10
subset(man2, Beimischung3==3)
--> gives 0 rows

Does anybody have a clue for me?
Thanks in advance
Regards
Elisabeth
___
Dipl.-Ing. Elisabeth Eckstädt
Member of Academic Staff & PhD Student
Technische Universität Dresden
Faculty of Mechanical Engineering
Institue of Power Engineering
Chair of Building Energy Systems and Heat Supply

Phone +49 351 463 34709
Fax +49 351 463 37076
Web http://tu-dresden.de/mw/iet/ew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange behaviour when subsetting a data.frame

2013-11-07 Thread Doran, Harold
Yes, but notice that

man2[3,] == .3
[1] FALSE

This is because of issues of machine precision when dealing with floating 
points, not a problem in R.  Comparisons for nearly equivalent numbers are done 
using all.equal() as shown below.

> all.equal(man2[3,], .3)
[1] TRUE




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Eckstädt, Elisabeth
Sent: Thursday, November 07, 2013 8:37 AM
To: r-help@R-project.org
Subject: [R] strange behaviour when subsetting a data.frame

Hello everyone,
I am experiencing a unfathomable benaviour of "subset" on a data.frame. This is 
a minimal reproducable example. The data.frame cosists only of one column, 
which contains 10 ascending values (from 0.1 to 1). Subsetting for 0.1 is 
working (gives me one row), subsetting for 0.3 gives me zero rows? Doing the 
same with values from 1 to 10 is working out well (as expected). 

Beimischung=seq(0.1,1,0.1)
man2=data.frame(Beimischung)
subset(man2, Beimischung==0.3)
#---> gives 0 rows
subset(man2, Beimischung==0.1)
---> gives one row (as expected)#

#also not working:

man2$Beimischung3=man2$Beimischung*10
subset(man2, Beimischung3==3)
--> gives 0 rows

Does anybody have a clue for me?
Thanks in advance
Regards
Elisabeth
___
Dipl.-Ing. Elisabeth Eckstädt
Member of Academic Staff & PhD Student
Technische Universität Dresden
Faculty of Mechanical Engineering
Institue of Power Engineering
Chair of Building Energy Systems and Heat Supply

Phone +49 351 463 34709
Fax +49 351 463 37076
Web http://tu-dresden.de/mw/iet/ew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] all combinations with replacement not ordered

2013-11-07 Thread William Dunlap
Is this what you want?
   f <- function (x, m)  combn(x + m - 1, m) - seq_len(m) + 1

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf
> Of Konstantin Tretiakov
> Sent: Thursday, November 07, 2013 5:38 AM
> To: r-help@r-project.org
> Subject: [R] all combinations with replacement not ordered
> 
> Hello!
> 
> I need to obtain all possible combinations with replacement when order is
> not important.
> E.g. I have a population x{1,2,3}.
> So I can get (choose(3+3-1,3)=) 10 combinations from this population with
> 'size=3'.
> How can I get a list of all that combinations?
> 
> I have tried 'expand.grid()' and managed to get only samples where order is
> important.
> 'combn()' gave me samples without replacement.
> 
> Best regards,
> Konstantin Tretyakov.
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Same code - error in one PC but not in other

2013-11-07 Thread Juan Antonio Balbuena

   Hello
   I am running exactly the same code on two different computers. In PC1 of the
   the code is working fine, whereas in PC2 I get this error
   Error in HP.LUT[, 1] : incorrect number of dimensions
   HP.LUT is an object within a function computed as HP.LUT <- which(HP ==1,
   arr.in=TRUE), where HP is a binary matrix.
   HP.LUT is not returned for further use. I have made sure that no object with
   the name HP.LUT is present in either PC1 and 2, and I have used both RStudio
   and R 3.01.
   So I am puzzled and wonder what the reason for this might be.
   Any hint will be much appreciated.
   Thank you very much for your attention
   Juan A. Balbuena

   --

   Dr. Juan A. Balbuena
   Marine Zoology Unit
   Cavanilles Institute of Biodiversity and Evolutionary Biology
   University of
   Valencia
   [1]http://www.uv.es/~balbuena
   P.O. Box 22085
   [2]http://www.uv.es/cavanilles/zoomarin/index.htm
   46071 Valencia, Spain
   [3]http://cetus.uv.es/mullpardb/index.html
   e-mail: [4]j.a.balbu...@uv.estel. +34 963 543 658fax +34 963 543 733
   
   NOTE! For shipments by EXPRESS COURIER use the following street address:
   C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.
   

References

   1. http://www.uv.es/%7Ebalbuena
   2. http://www.uv.es/cavanilles/zoomarin/index.htm
   3. http://cetus.uv.es/mullpardb/index.html
   4. mailto:j.a.balbu...@uv.es
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding the complementary log-link to binomial() and make.link()

2013-11-07 Thread Ken Knoblauch
Roland Deutsch  tuwien.ac.at> writes:
> in my research I frequently work with binomial 
response models, which 
> are of course part of the generalized linear 
models. While I do use 
> common link functions such as the logit, probit 
and cloglog, I often 
> have the need of invoking the lesser-known 
Complementary Log link 
> (Walter W. Piegorsch, 1992, "Complementary Log 
Regression for 
> Generalized Linear Models ", The American 
Statistician , Vol. 46, No. 2, 
> pp. 94-99 ) which is based on the exponential 
distribution.
> 
> Before the release of R 3.0, I simply could 
do so by adding the 
> following lines to the long switch command 
in make.link(...):
> 
> clog = {
>  linkfun <- function(mu) qexp(mu)
>  linkinv <- function(eta) pmax(.Machine$double.eps,pexp(eta))
>  mu.eta <- function(eta) pmax(dexp(eta), .Machine$double.eps)
>  valideta <- function(eta) all(eta > 0)
>  },
> 
> and then add "clog" to the okLinks vector in 
binomial(). However, I 

I wouldn't mess with make.link.  Why don't you just
define your own custom link, for example following the
example under help(family)?  This is what I did in
the psyphy package and I just checked and
they all still work under the current version of R.

> Thanks a lot,
> Roland Deutsch

-- 
Kenneth Knoblauch
Inserm U846
Stem-cell and Brain Research Institute
Department of Integrative Neurosciences
18 avenue du Doyen Lépine
69500 Bron
France
tel: +33 (0)4 72 91 34 77
fax: +33 (0)4 72 91 34 61
portable: +33 (0)6 84 10 64 10
http://www.sbri.fr/members/kenneth-knoblauch.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading in stata file with read.dta works in R x64 3.0.1 and crashes R x64 3.0.2

2013-11-07 Thread Anthony Damico
sorry i missed that.   install.packages('foreign')   solved the problem,
thank you!!  :)


On Thu, Nov 7, 2013 at 10:41 AM, Prof Brian Ripley wrote:

> See the posting guide: your version of 'foreign' is not current:
> http://cran.r-project.org/web/packages/foreign/index.html .
>
> Please update your packages and try again.  (This looks very like a bug in
> recently contributed code that has already been fixed.)
>
>
> On 07/11/2013 13:40, Anthony Damico wrote:
>
>> this file
>>
>>
>> http://www.electionstudies.org/studypages/data/anes_
>> mergedfile_1992to1997/anes_mergedfile_1992to1997_dta.zip
>>
>> can be downloaded after free registration on this page
>>
>>  http://electionstudies.org/studypages/download/registration_form.php
>>
>>
>> imports properly in windows R x64 3.0.1 but causes R x64 3.0.2 to crash
>>
>>
>> the crash occurs at the line
>>
>> rval <- .External(do_readStata, file)
>>
>>
>> at the bottom of this e-mail, i have included code that will reproduce the
>> problem exactly in 3.0.2 but not 3.0.1
>>
>>
>> and if i don't load the memisc and Hmisc packages, i get an error instead
>> of a crash..
>>
>>
>>  x <- read.dta( fp , convert.factors = FALSE )
>>>
>> Error in read.dta(fp, convert.factors = FALSE) :
>>Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL'
>>
>>
>> ..but if i load both (as shown in the script below), R just dies.
>>
>>
>> =
>>
>>
>> sessionInfo for 3.0.1 (the version that works)
>>
>>  sessionInfo()
>>>
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>> States.1252LC_MONETARY=English_United States.1252
>> LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] grid  splines   stats graphics  grDevices utils datasets
>> methods   base
>>
>> other attached packages:
>> [1] memisc_0.96-6   MASS_7.3-26 lattice_0.20-15 Hmisc_3.12-2
>> Formula_1.1-1   survival_2.37-4 httr_0.2stringr_0.6.2
>> foreign_0.8-53
>>
>> loaded via a namespace (and not attached):
>> [1] cluster_1.14.4 digest_0.6.3   RCurl_1.95-4.1 rpart_4.1-1
>> tools_3.0.1
>>
>>
>> =
>>
>> sessionInfo in 3.0.2 right before the crash --
>>
>>  sessionInfo()
>>>
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252
>> [2] LC_CTYPE=English_United States.1252
>> [3] LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] grid  splines   stats graphics  grDevices utils datasets
>> [8] methods   base
>>
>> other attached packages:
>> [1] memisc_0.96-6   MASS_7.3-29 lattice_0.20-23 Hmisc_3.12-2
>> [5] Formula_1.1-1   survival_2.37-4 httr_0.2stringr_0.6.2
>> [9] foreign_0.8-55
>>
>> loaded via a namespace (and not attached):
>> [1] cluster_1.14.4 digest_0.6.3   RCurl_1.95-4.1 rpart_4.1-3
>> tools_3.0.2
>>
>>
>>
>>
>> =
>>
>>
>> # code that works in 3.0.1 and crashes in 3.0.2 --
>>
>>
>> setwd( "C:/my directory/anes")
>> your.username <- "em...@address.com"
>> your.password <- "your.password"
>>
>>
>>
>>
>> require(foreign) # load foreign package (converts data files into R)
>> require(stringr)# load stringr package (manipulates character strings
>> easily)
>> require(httr)# load httr package (downloads files from the web,
>> with SSL and cookies)
>> require(Hmisc) # load Hmisc package (loads spss.get function)
>> require(memisc)# load memisc package (loads spss portable table
>> import functions)
>>
>> # construct a list containing the pre-specified login information
>> values <-
>>  list(
>>  "email" = your.username ,
>>  "pass" = your.password
>>  )
>>
>> # contact the anes website to log in
>> POST( "http://www.electionstudies.org/studypages/download/login-
>> process.php"
>> , body = values )
>>
>> # download the `all_datasets` page to figure out what needs to be
>> downloaded
>> z <- GET( "
>> http://www.electionstudies.org/studypages/download/
>> datacenter_all_datasets.php"
>> )
>>
>> # create a temporary file and a temporary directory
>> tf <- tempfile() ; td <- tempdir()
>>
>> # download the damn file
>> z <- GET( "
>> http://www.electionstudies.org/studypages/data/anes_
>> mergedfile_1992to1997/anes_mergedfile_1992to1997_dta.zip"
>> )
>>
>> # save the result to a temporary file on the local disk
>> writeBin( z$content , tf )
>>
>> # unzip that temporary file to an equally-temporary directory
>> z <- unzip( tf , exdir = td )
>>
>> # find which one it is from among everything zipped up together..
>> fp <- z[ grep( 'dta' , z ) ]
>>
>> # ..import that puppy
>> x <- read.dta( fp , convert.factors = FALSE )
>> # crash.
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org maili

Re: [R] reading in stata file with read.dta works in R x64 3.0.1 and crashes R x64 3.0.2

2013-11-07 Thread Prof Brian Ripley
See the posting guide: your version of 'foreign' is not current: 
http://cran.r-project.org/web/packages/foreign/index.html .


Please update your packages and try again.  (This looks very like a bug 
in recently contributed code that has already been fixed.)


On 07/11/2013 13:40, Anthony Damico wrote:

this file


http://www.electionstudies.org/studypages/data/anes_mergedfile_1992to1997/anes_mergedfile_1992to1997_dta.zip

can be downloaded after free registration on this page

 http://electionstudies.org/studypages/download/registration_form.php


imports properly in windows R x64 3.0.1 but causes R x64 3.0.2 to crash


the crash occurs at the line

rval <- .External(do_readStata, file)


at the bottom of this e-mail, i have included code that will reproduce the
problem exactly in 3.0.2 but not 3.0.1


and if i don't load the memisc and Hmisc packages, i get an error instead
of a crash..



x <- read.dta( fp , convert.factors = FALSE )

Error in read.dta(fp, convert.factors = FALSE) :
   Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL'


..but if i load both (as shown in the script below), R just dies.


=


sessionInfo for 3.0.1 (the version that works)


sessionInfo()

R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252
LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] grid  splines   stats graphics  grDevices utils datasets
methods   base

other attached packages:
[1] memisc_0.96-6   MASS_7.3-26 lattice_0.20-15 Hmisc_3.12-2
Formula_1.1-1   survival_2.37-4 httr_0.2stringr_0.6.2
foreign_0.8-53

loaded via a namespace (and not attached):
[1] cluster_1.14.4 digest_0.6.3   RCurl_1.95-4.1 rpart_4.1-1
tools_3.0.1


=

sessionInfo in 3.0.2 right before the crash --


sessionInfo()

R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] grid  splines   stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] memisc_0.96-6   MASS_7.3-29 lattice_0.20-23 Hmisc_3.12-2
[5] Formula_1.1-1   survival_2.37-4 httr_0.2stringr_0.6.2
[9] foreign_0.8-55

loaded via a namespace (and not attached):
[1] cluster_1.14.4 digest_0.6.3   RCurl_1.95-4.1 rpart_4.1-3
tools_3.0.2




=


# code that works in 3.0.1 and crashes in 3.0.2 --


setwd( "C:/my directory/anes")
your.username <- "em...@address.com"
your.password <- "your.password"




require(foreign) # load foreign package (converts data files into R)
require(stringr)# load stringr package (manipulates character strings
easily)
require(httr)# load httr package (downloads files from the web,
with SSL and cookies)
require(Hmisc) # load Hmisc package (loads spss.get function)
require(memisc)# load memisc package (loads spss portable table
import functions)

# construct a list containing the pre-specified login information
values <-
 list(
 "email" = your.username ,
 "pass" = your.password
 )

# contact the anes website to log in
POST( "http://www.electionstudies.org/studypages/download/login-process.php";
, body = values )

# download the `all_datasets` page to figure out what needs to be downloaded
z <- GET( "
http://www.electionstudies.org/studypages/download/datacenter_all_datasets.php";
)

# create a temporary file and a temporary directory
tf <- tempfile() ; td <- tempdir()

# download the damn file
z <- GET( "
http://www.electionstudies.org/studypages/data/anes_mergedfile_1992to1997/anes_mergedfile_1992to1997_dta.zip";
)

# save the result to a temporary file on the local disk
writeBin( z$content , tf )

# unzip that temporary file to an equally-temporary directory
z <- unzip( tf , exdir = td )

# find which one it is from among everything zipped up together..
fp <- z[ grep( 'dta' , z ) ]

# ..import that puppy
x <- read.dta( fp , convert.factors = FALSE )
# crash.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://s

Re: [R] strange behaviour when subsetting a data.frame

2013-11-07 Thread Sarah Goslee
R FAQ 7.31.

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

On Thu, Nov 7, 2013 at 8:36 AM, Eckstädt, Elisabeth
 wrote:
> Hello everyone,
> I am experiencing a unfathomable benaviour of "subset" on a data.frame. This 
> is a minimal reproducable example. The data.frame cosists only of one column, 
> which contains 10 ascending values (from 0.1 to 1). Subsetting for 0.1 is 
> working (gives me one row), subsetting for 0.3 gives me zero rows? Doing the 
> same with values from 1 to 10 is working out well (as expected).
>
> Beimischung=seq(0.1,1,0.1)
> man2=data.frame(Beimischung)
> subset(man2, Beimischung==0.3)
> #---> gives 0 rows
> subset(man2, Beimischung==0.1)
> ---> gives one row (as expected)#
>
> #also not working:
>
> man2$Beimischung3=man2$Beimischung*10
> subset(man2, Beimischung3==3)
> --> gives 0 rows
>
> Does anybody have a clue for me?
> Thanks in advance
> Regards
> Elisabeth

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Checking datetime value

2013-11-07 Thread vikrant
Hi ,
I would like to check if a column contains datetime values. (column may
contain any datetime format or just numbers or string). Request your help 




--
View this message in context: 
http://r.789695.n4.nabble.com/Checking-datetime-value-tp4679963.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] strange behaviour when subsetting a data.frame

2013-11-07 Thread Eckstädt , Elisabeth
Hello everyone,
I am experiencing a unfathomable benaviour of "subset" on a data.frame. This is 
a minimal reproducable example. The data.frame cosists only of one column, 
which contains 10 ascending values (from 0.1 to 1). Subsetting for 0.1 is 
working (gives me one row), subsetting for 0.3 gives me zero rows? Doing the 
same with values from 1 to 10 is working out well (as expected). 

Beimischung=seq(0.1,1,0.1)
man2=data.frame(Beimischung)
subset(man2, Beimischung==0.3)
#---> gives 0 rows
subset(man2, Beimischung==0.1)
---> gives one row (as expected)#

#also not working:

man2$Beimischung3=man2$Beimischung*10
subset(man2, Beimischung3==3)
--> gives 0 rows

Does anybody have a clue for me?
Thanks in advance
Regards
Elisabeth
___
Dipl.-Ing. Elisabeth Eckstädt
Member of Academic Staff & PhD Student
Technische Universität Dresden
Faculty of Mechanical Engineering
Institue of Power Engineering
Chair of Building Energy Systems and Heat Supply

Phone +49 351 463 34709
Fax +49 351 463 37076
Web http://tu-dresden.de/mw/iet/ew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] all combinations with replacement not ordered

2013-11-07 Thread Konstantin Tretiakov
Hello!

I need to obtain all possible combinations with replacement when order is
not important.
E.g. I have a population x{1,2,3}.
So I can get (choose(3+3-1,3)=) 10 combinations from this population with
'size=3'.
How can I get a list of all that combinations?

I have tried 'expand.grid()' and managed to get only samples where order is
important.
'combn()' gave me samples without replacement.

Best regards,
Konstantin Tretyakov.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] AER ivreg diagnostics: question on DF of Sargan test

2013-11-07 Thread Hélène Huber-Yahi
Hello,
I'm new to R and I'm currently learning to use package AER, which is
extremely comprehensive and useful. I have one question related to the
diagnostics after ivreg: if I understood well, the Sargan test provided
states that the statistic should follow a Chi squared of degrees of freedom
equal to the number of excluded instruments minus one. But I read many
times that the degrees of freedom of this statistic is supposed to equal
the number of overidentifying restrictions, i.e. the number of excluded
instruments minus the number of endogenous variables tested. When comparing
with Stata results (estat overid after ivreg, same with ivreg2 output), the
statistic is the same as the one provided by R, only the p-value changes
because the distribution chosen is different. Is this command using a
different flavor of the Sargan test ? I did not find the details in the AER
pdf.
I'm using Rstudio with R 3.0.2 (Windows 7) and AER is up to date. The
output I get from R is the following, where the Sargan DF is equal to 5,
while I thought it would be equal to 6-3=3. The data comes from Verbeek's
econometrics textbook and the example replicates the one in the book.
Dependent variable is log of wage, endogenous variables are education,
experience and its square (3 of them), excluded instruments are parents'
education etc (6 of them).

> ivmodel <- ivreg(lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 + south76 | 
> daded + momed + libcrd14 + age76 + age762 + nearc4 + black + smsa76 + 
> south76,+ data = school)> > summary(ivmodel,diagnostics=TRUE)
Call:
ivreg(formula = lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 +
south76 | daded + momed + libcrd14 + age76 + age762 + nearc4 +
black + smsa76 + south76, data = school)

Residuals:
 Min   1Q   Median   3Q  Max
-1.63375 -0.22253  0.02403  0.24350  1.32911

Coefficients:
  Estimate Std. Error t value Pr(>|t|)
(Intercept)  4.6064811  0.1126195  40.903  < 2e-16 ***
ed76 0.0848507  0.0066061  12.844  < 2e-16 ***
exp760.0796432  0.0164406   4.844 1.34e-06 ***
exp762  -0.0020376  0.0008257  -2.468   0.0136 *
black   -0.1726723  0.0195231  -8.845  < 2e-16 ***
smsa76   0.1521693  0.0165207   9.211  < 2e-16 ***
south76 -0.1204765  0.0154904  -7.778 1.01e-14 ***

Diagnostic tests:
  df1  df2 statistic p-value
Weak instruments6 2987   965.450  <2e-16 ***
Wu-Hausman  2 2988 1.949   0.143
Sargan  5   NA 3.868   0.569
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3753 on 2990 degrees of freedom
Multiple R-Squared: 0.2868, Adjusted R-squared: 0.2854
Wald test: 178.6 on 6 and 2990 DF,  p-value: < 2.2e-16


Would this be caused by the fact that I'm using 2SLS and not GMM (at least
I suppose) to estimate the IV model ? I apologize if this comes from a
misunderstanding from my part, and I thank you in advance for your help.

Best,

H. Huber

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adding the complementary log-link to binomial() and make.link()

2013-11-07 Thread Roland Deutsch
Dear R Help,

in my research I frequently work with binomial response models, which 
are of course part of the generalized linear models. While I do use 
common link functions such as the logit, probit and cloglog, I often 
have the need of invoking the lesser-known Complementary Log link 
(Walter W. Piegorsch, 1992, "Complementary Log Regression for 
Generalized Linear Models ", The American Statistician , Vol. 46, No. 2, 
pp. 94-99 ) which is based on the exponential distribution.

Before the release of R 3.0, I simply could do so by adding the 
following lines to the long switch command in make.link(...):

clog = {
 linkfun <- function(mu) qexp(mu)
 linkinv <- function(eta) pmax(.Machine$double.eps,pexp(eta))
 mu.eta <- function(eta) pmax(dexp(eta), .Machine$double.eps)
 valideta <- function(eta) all(eta > 0)
 },

and then add "clog" to the okLinks vector in binomial(). However, I 
recently installed R 3.0.2 and cannot get the clog link to work. I guess 
it has to do do with the fact that both make.link and binomial are 
residing in the Global environment, and does I do not know how to access 
the C routine C_binomial_dev_resids to compute the deviance residuals 
within binomial().

While I would appreciate any help on this issue, I also thought it may 
benefit other users in my research area to have the complementary log 
link included in make.link. Any chance of this happening?

Thanks a lot,
Roland Deutsch

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] loop for backtesting

2013-11-07 Thread wintwin111
First of all sorry for my bad english but its not my native language.

I am working on a paper on Portfolio Optimization with Markowitz and Lower
Partial Moments. 

I want to compare the returns of the minimum variance portfolios from booth
methods. First of all i have an in-sample multivariate timeseries from 4
stocks reaching from 1.october.2011 to 1.october 2012 (log-returns, daily
data, 257 observations for each stock).

I start to optimize my portfolio using the package tseries as follow:

portfolio.optim(x,pm=mean(x),riskless=F,rf=0,shorts=F,reslow=NULL,reshigh=NULL,covmat=cov(x))

with this i get the weights, the mean return of the whole period, the
standard deviation and the returns on each day for my in-sample optimal
portfolio Markowitz portfolio. 

The out of sample data reaches from 2.october.2012 to 1.october.2013
(log-returns,daily data, 253 observations for each stock, again a
multivariate time series). Now i want to optimize the Portfolio 253 times.
Each time the log-returns for one day should be added to the original
in-sample timeseries (first optimization 257 in-sample data plus the first
from the out of sample data and so on). Now i should get new weights for
every of the 253 periods and therefor new returns for the portfolio every
period. 

My advisor at the university told me i cant use backtest packages cause they
cant handle the Lower Partial Moments part of my analysis. The problem is
just for the markowitz portfolio optimization. 

I hope you can help me with my problem

greetings wintwin111



--
View this message in context: 
http://r.789695.n4.nabble.com/loop-for-backtesting-tp4679962.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] FW: Nadaraya-Watson kernel

2013-11-07 Thread Liaw, Andy
Use KernSmooth (one of the recommended packages that are included in R 
distribution).  E.g.,

> library(KernSmooth)
KernSmooth 2.23 loaded
Copyright M. P. Wand 1997-2009
> x <- seq(0, 1, length=201)
> y <- 4 * cos(2*pi*x) + rnorm(x)
> f <- locpoly(x, y, degree=0, kernel="epan", bandwidth=.1)
> plot(x, y)
> lines(f, lwd=2)

Andy

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Ms khulood aljehani
Sent: Tuesday, November 05, 2013 9:49 AM
To: r-h...@stat.math.ethz.ch
Subject: [R] FW: Nadaraya-Watson kernel

From: aljehan...@hotmail.com
To: r-help@r-project.org
Subject: Nadaraya-Watson kernel
Date: Tue, 5 Nov 2013 17:42:13 +0300




Hello
 
i want to compute the Nadaraya-Watson kernel estimation when the kernel 
function is Epanchincov kernel
i use the command
ksmooth(x, y, kernel="normal", bandwidth ,)
 
the argmunt ( kernel="normal" ) accept normal and box kernels
i want to compute it if the kerenl = Epanchincov
 
 
thank you
 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with interaction in lmer even after creating an "interaction variable"

2013-11-07 Thread Ben Bolker
a_lampei  uni-tuebingen.de> writes:

> 
> Dear all,
> I have a problem with interactions in lmer. I have 2 factors (garden and
> gebiet) which interact, plus one other variable (home), 
> dataframe arr. When
> I put:
> /
> lmer (biomass ~ home + garden:gebiet +  ( 1|Block), data = arr)/
> 
> it writes:
> /Error in lme4::lFormula(formula = biomass ~ home + garden:gebiet 
> + (1 |  : 
>   rank of X = 28 < ncol(X) = 30/
> 
> In the lmer help I found out that if not all combination of
>  the interaction
> are realized, lmer has problems and one should do new variable using
> "interaction", which I did:
> /
> arr$agg <- interaction (arr$gebiet, arr$garden, drop = TRUE)/
> 
> when I fit the interaction term now:
>  /
> lmer (biomass ~ home + agg+  ( 1|Block), data = arr)/
> 
> The error does not change:
> /
> Error in lme4::lFormula(formula = biomass ~ home + agg + (1 | Block),  : 
>   rank of X = 28 < ncol(X) = 29/
> 
> No NAs are in the given variables in the dataset.
> 
> Interestingly it works when I put only given interaction like
> 
> /lmer (biomass ~ agg +  ( 1|Block), data = arr)/
> 
> Even following models work:
> /lmer (biomass ~ gebiet*garden +  ( 1|Block), data = arr)
> lmer (biomass ~ garden + garden:gebiet  +( 1|Block), data = arr)/
> 
> But if I add the interaction term in th enew formate of 
> the new fariable, it
> reports again the same error.
> 
> /lmer (biomass ~ garden + agg  +( 1|Block), data = arr)/
> 
> If I put any other variable from the very same dataframe 
> (not only variable
> "home"), the error is reported again.
> 
> I do not understand it, the new variable is just another 
> factor now, or? And
> it is in the same dataframe, it has the same length.
> 
> Does anyone have any idea?
> 
> Thanks a lot, Anna
> 

  This probably belongs on r-sig-mixed-models.

  Presumably 'home' is still correlated with one of the
columns of 'garden:gebiet'.

  Here's an example of how you can use svd() to find out which
of your columns are collinear:

set.seed(101)
d <- data.frame(x=runif(100),y=1:100,z=2:101)
m <- model.matrix(~x+y+z,data=d)
s <- svd(m)
zapsmall(s$d)
## [1] 828.8452   6.6989   2.6735   0.
## this tells us there is one collinear component
zapsmall(s$v)
##[,1]   [,2]   [,3]   [,4]
## [1,] -0.0105005 -0.7187260  0.3872847  0.5773503
## [2,] -0.0054954 -0.4742947 -0.8803490  0.000
## [3,] -0.7017874  0.3692117 -0.1945349  0.5773503
## [4,] -0.7122879 -0.3495142  0.1927498 -0.5773503
## this tells us that the first (intercept), third (y),
## and fourth (z) column of the model matrix are
## involved in the collinear term, i.e.
## 1+y-z is zero

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reading in stata file with read.dta works in R x64 3.0.1 and crashes R x64 3.0.2

2013-11-07 Thread Anthony Damico
this file


http://www.electionstudies.org/studypages/data/anes_mergedfile_1992to1997/anes_mergedfile_1992to1997_dta.zip

can be downloaded after free registration on this page

http://electionstudies.org/studypages/download/registration_form.php


imports properly in windows R x64 3.0.1 but causes R x64 3.0.2 to crash


the crash occurs at the line

rval <- .External(do_readStata, file)


at the bottom of this e-mail, i have included code that will reproduce the
problem exactly in 3.0.2 but not 3.0.1


and if i don't load the memisc and Hmisc packages, i get an error instead
of a crash..


> x <- read.dta( fp , convert.factors = FALSE )
Error in read.dta(fp, convert.factors = FALSE) :
  Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL'


..but if i load both (as shown in the script below), R just dies.


=


sessionInfo for 3.0.1 (the version that works)

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252
LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] grid  splines   stats graphics  grDevices utils datasets
methods   base

other attached packages:
[1] memisc_0.96-6   MASS_7.3-26 lattice_0.20-15 Hmisc_3.12-2
Formula_1.1-1   survival_2.37-4 httr_0.2stringr_0.6.2
foreign_0.8-53

loaded via a namespace (and not attached):
[1] cluster_1.14.4 digest_0.6.3   RCurl_1.95-4.1 rpart_4.1-1
tools_3.0.1


=

sessionInfo in 3.0.2 right before the crash --

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] grid  splines   stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] memisc_0.96-6   MASS_7.3-29 lattice_0.20-23 Hmisc_3.12-2
[5] Formula_1.1-1   survival_2.37-4 httr_0.2stringr_0.6.2
[9] foreign_0.8-55

loaded via a namespace (and not attached):
[1] cluster_1.14.4 digest_0.6.3   RCurl_1.95-4.1 rpart_4.1-3
tools_3.0.2




=


# code that works in 3.0.1 and crashes in 3.0.2 --


setwd( "C:/my directory/anes")
your.username <- "em...@address.com"
your.password <- "your.password"




require(foreign) # load foreign package (converts data files into R)
require(stringr)# load stringr package (manipulates character strings
easily)
require(httr)# load httr package (downloads files from the web,
with SSL and cookies)
require(Hmisc) # load Hmisc package (loads spss.get function)
require(memisc)# load memisc package (loads spss portable table
import functions)

# construct a list containing the pre-specified login information
values <-
list(
"email" = your.username ,
"pass" = your.password
)

# contact the anes website to log in
POST( "http://www.electionstudies.org/studypages/download/login-process.php";
, body = values )

# download the `all_datasets` page to figure out what needs to be downloaded
z <- GET( "
http://www.electionstudies.org/studypages/download/datacenter_all_datasets.php";
)

# create a temporary file and a temporary directory
tf <- tempfile() ; td <- tempdir()

# download the damn file
z <- GET( "
http://www.electionstudies.org/studypages/data/anes_mergedfile_1992to1997/anes_mergedfile_1992to1997_dta.zip";
)

# save the result to a temporary file on the local disk
writeBin( z$content , tf )

# unzip that temporary file to an equally-temporary directory
z <- unzip( tf , exdir = td )

# find which one it is from among everything zipped up together..
fp <- z[ grep( 'dta' , z ) ]

# ..import that puppy
x <- read.dta( fp , convert.factors = FALSE )
# crash.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple String word replacements: Performance Issue

2013-11-07 Thread jim holtman
Here is a start.  I was wondering how long it would take to at least
substitute 800 different patterns into 4M vectors.  Here is my test.
It took longer (99 sec) to create the test data than to do the
substitutes (52 secs).  Now some variations on this can provide the
other information that you are probably after in less than a day ( I
would guess less than an hour)


> n <- 1000
> x <- paste0("$"
+ , sample(LETTERS, n, TRUE)
+ , sample(LETTERS, n, TRUE)
+ , sample(LETTERS, n, TRUE)
+ , sample(LETTERS, n, TRUE)
+ )
> x <- x[!duplicated(x)][1:800]
>
> n <- 400
> system.time({
+ output <- replicate(n, paste(sample(x,2), collapse = ' '))
+ })
   user  system elapsed
  99.850.22  100.37
>
> system.time({
+ pattern <- paste0("\\", x, collapse = "|")
+ z <- gsub(pattern, "[ticker]", output, perl = TRUE)
+ })
   user  system elapsed
  52.050.00   52.21
>
>
> str(output)
 chr [1:400] "$JHVN $VKOL" "$GTEU $CEGL" "$LOEY $ETQK" "$AFDO
$SDLH" "$MOIN $WEVR" ...
> str(z)
 chr [1:400] "[ticker] [ticker]" "[ticker] [ticker]" "[ticker] [ticker]" ...
> str(pattern)
 chr 
"\\$MATF|\\$GFGC|\\$SRYC|\\$HLWS|\\$GHFB|\\$BGVU|\\$GFDW|\\$PSFN|\\$ONDY|\\$SXUH|\\$EBDJ|\\$YNQY|\\$NDBT|\\$TOQK|\\$IUBN|\\$VSMT"|
__truncated__
>

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Wed, Nov 6, 2013 at 8:11 AM, Simon Pickert  wrote:
> Dear experts,
> I’ve been on this for weeks now, and couldn’t find a solution..Sorry for the 
> long description. I figured I post many details, so you get the problem 
> entirely, although it’s not hard to grasp.
>
> **Situation:**
> Data frame consisting of 4 million entries (total size: 250 MB). Two columns: 
> `ID` and `TEXT`. Text strings are each up to 200 characters.
>
>
> **Task:**
> Preprocessing the text strings
>
> Example Data:
>
>
> +——+—+
> |  ID| Text   
>   |
> +——+—+
> | 123  | $AAPL is up +5%|
> | 456  | $MSFT , $EBAY doing great.  www.url.com   |
>   ..
> +——+—+
>
> Should become
>
> +——+——-——+
> |  ID| Text clean 
>|  First Ticker  |  All Ticker   |   Ticker Count
> +——++——+ +———-—+
> | 123  | [ticker] is up [positive_percentage]   | 
>   $aapl   |   $aapl|  1
> | 456  | [ticker] [ticker] doing great [url] [pos_emotion] |   
> $msft   |   $msft,$ebay  |  2
>   ..
> +——++——-+——+——+
>
>
>
> **Problem:**
> It takes too long. On my 8GB RAM Dual-Core machine: Cancelled after 1 day. On 
> a 70GB 8-Core Amazon EC2 instance: Cancelled after 1 day.
>
>
> **Details:**
> I am basically
>
>  - Counting how often certain words appear in one string
>  - Write this number into a new column (COUNT)
>  - Replace this (counted) word
>  - Replace other words (which I don't need to count before)
>  - Replace some regular expressions
>
> The vectors which are used as patterns look like this:
>
> "\\bWORD1\\b|\\bWORD2\\b|\\bWORD3\\b|\\bWORD4\\b..."
>
> Thus, those 'replacement vectors' are character vectors of length 1, each 
> containing up to 800 words
>
>
>
> **Main code:**
>
> library("parallel")
> library("stringr")
>
> preprocessText<-function(x){
>
>   # Replace the 'html-and'
>   arguments<-list(pattern="\\&\\;",replacement="and",x=x, 
> ignore.case=TRUE)
>   y<-do.call(gsub, arguments)
>
>   # Remove some special characters
>
> arguments<-list(pattern="[^-[:alnum:]\\'\\:\\/\\$\\%\\.\\,\\+\\-\\#\\@\\_\\!\\?+[:space:]]",replacement="",x=y,
>  ignore.case=TRUE)
>   y<-do.call(gsub, arguments)
>
>   # Lowercase
>   arguments<-list(string=y,pattern=tolower(rep_ticker))
>   first<-do.call(str_match,arguments)
>
>   # Identify signal words and count them
>   # Need to be done in parts, because otherwise R can't handle this many 
> at once
>   arguments<-list(string=x, pattern=rep_words_part1)
>   t1<-do.call(str_extract_all,arguments)
>
>   arguments<-list(string=x, pattern=rep_words_part2)
>   t2<-do.call(str_extract_all,arguments)
>
>   arguments<-list(string=x, pattern=rep_words_part3)
>   t3<-do.call(str_extract_all,arguments)
>
>   arguments<-list(string=x, pattern=rep_words_part4)
>   t4<-do.call(str_extract_all,arguments)
>
>   count=length(t1[[1]])+length(t2[[1]])+length(t3[[1]])+length(t4[[1]])
>   signal_words=c(t1[[1]],t2[[1]],t3[[1]],t4[[1]])
>
>
>   # Replacements
>
>   arguments<-list(pattern=rep_wordsA,re

Re: [R] Fitting multiple horizontal lines to data

2013-11-07 Thread Carl Witthoft
You already asked this on StackOverflow.
The answer remains the same,  pretty much what David W.  wrote:   this is
not a question about fitting lines to data.  You need to step back and think
about what message you want to deliver to those who will view your graph,
and what the meaning of your dataset is in the first place.




--
View this message in context: 
http://r.789695.n4.nabble.com/Fitting-multiple-horizontal-lines-to-data-tp4679891p4679952.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with interaction in lmer even after creating an "interaction variable"

2013-11-07 Thread a_lampei
Dear all,
I have a problem with interactions in lmer. I have 2 factors (garden and
gebiet) which interact, plus one other variable (home), dataframe arr. When
I put:
/
lmer (biomass ~ home + garden:gebiet +  ( 1|Block), data = arr)/

it writes:
/Error in lme4::lFormula(formula = biomass ~ home + garden:gebiet + (1 |  : 
  rank of X = 28 < ncol(X) = 30/

In the lmer help I found out that if not all combination of the interaction
are realized, lmer has problems and one should do new variable using
"interaction", which I did:
/
arr$agg <- interaction (arr$gebiet, arr$garden, drop = TRUE)/

when I fit the interaction term now:
 /
lmer (biomass ~ home + agg+  ( 1|Block), data = arr)/

The error does not change:
/
Error in lme4::lFormula(formula = biomass ~ home + agg + (1 | Block),  : 
  rank of X = 28 < ncol(X) = 29/

No NAs are in the given variables in the dataset.

Interestingly it works when I put only given interaction like

/lmer (biomass ~ agg +  ( 1|Block), data = arr)/

Even following models work:
/lmer (biomass ~ gebiet*garden +  ( 1|Block), data = arr)
lmer (biomass ~ garden + garden:gebiet  +( 1|Block), data = arr)/

But if I add the interaction term in th enew formate of the new fariable, it
reports again the same error.

/lmer (biomass ~ garden + agg  +( 1|Block), data = arr)/

If I put any other variable from the very same dataframe (not only variable
"home"), the error is reported again.

I do not understand it, the new variable is just another factor now, or? And
it is in the same dataframe, it has the same length.

Does anyone have any idea?

Thanks a lot, Anna


 









--
View this message in context: 
http://r.789695.n4.nabble.com/problem-with-interaction-in-lmer-even-after-creating-an-interaction-variable-tp4679951.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R: resdiuals of random model estimated by plm function

2013-11-07 Thread Millo Giovanni
Dear Alfonso,

in a RE model you do not explicitly estimate every single individual effect, 
but only the variance of the distribution they have been "drawn from". Hence 
the only (pointwise) residual you can estimate ex-post is the composite one: 
i.e., the sum.

Best,
Giovanni 

Giovanni Millo, PhD
Research Dept.,
Assicurazioni Generali SpA
Via Machiavelli 3,
34132 Trieste (Italy)
tel. +39 040 671184
fax  +39 040 671160

-Messaggio originale-
Da: alfonso.carf...@uniparthenope.it [mailto:alfonso.carf...@uniparthenope.it] 
Inviato: mercoledì 6 novembre 2013 17.30
A: r-help@r-project.org
Cc: Millo Giovanni
Oggetto: resdiuals of random model estimated by plm function

Hi all,


I have estimated a random panel model using plm function.

I have a question about the vector of resduals obtained with the object 
$residuals.

example:

data("Produc", package = "plm")
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, model="random", 
data = Produc, index = c("state","year"))

res<-zz$residuals # vector of the residuals.

the vector res is the sum of the idyosiyncratic (eit) and individual
(ui) component or is only the idyosiyncratic (eit) component?

Thanks
Alfonso





 
Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.