[R] robust standard errors in maximum likelihood estimation; sandwich estimator for mle/mle2
Dear list, After more reading, I can specify my rather broad question I asked yesterday and therefore ask a better question: I have specified a function that gives me log likelihood values. In the function, I have several free parameters (the function itself is not linear). I use mle2 to find the maximum likelihood estimators for all free parameters. When I use summary() on the object created by mle2 I get the maximum likelihhod estimators, standard errors, corresponding z-values and Pr(z). My problem: The data I fit the function to consists of repeated choices by multiple participants. This means I have to correct standard errors that are shown by summary() since these standard errors are calculated under the assumption that each choice is independent. From what I read is that I need the sandwich estimator (i.e., Huber) to estimate robust errors. This estimator is implemented in the R-library "sandwich". But, as far as I found out, the library needs an object of the (e.g.) type lm. An object resulting from mle2 cannot be used with the commands of the package. In STATA maximum likelihood estimation with robust standard errors is easily implemented with he command "cluster(id)". Is there something similar in R? Thank you for any advice, Marc Gesendet: Dienstag, 01. Juli 2014 um 10:07 Uhr Von: "Marc Jekel" An: r-help@r-project.org Betreff: maximum likelihood estimation with clustered data Dear list, I am currently trying to fit free parameters of a model from economics (cumulative prospect theory) using maximum likelihood estimation. I know how to do maximum likelihood estimation using mle or mle2 in R, the problem to which I could not find a solution to is that my data is correlated (i.e., multiple participants with multiple responses) which needs to be accounted for when doing mle. In STATA, mle can be done with clustered data (with the command "ml model ..., cluster(id)") but I could not find an equivalent command in R. More detail (in case someone tried to do the same before): I try to implement an approach proposed by Glenn Harrison who shows in STATA how to implement user-written maximum likelihood estimates for utility functions with clustered data ([1]http://faculty.cbpp.uaa.alaska.edu/jalevy/protected/HarrisonSTATML. pdf). Thank you for any hint, Marc References 1. http://faculty.cbpp.uaa.alaska.edu/jalevy/protected/HarrisonSTATML.pdf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R + memory of objects
Dear R community, I am still struggling a bit on how R does memory allocation and how to optimize my code to minimize working memory load. Simon (thanks!) and others gave me a hint to use the command "gc()" to clean up memory which works quite nice but appears to me to be more like a "fix" to a problem. To give you an impression of what I am talking, here is a short code example + I will give rough measure (system track app) of my working memory needed for each computational step (R64bit latest version on WIN 7 64 bit system, 2 Cores, approx 4 GB Ram): ## # example 1: y= matrix(rep(1,5000), nrow = 5000/2 , ncol = 2) # used working memory increases from 1044 --> 1808 MB # (same command again, i.e.) y= matrix(rep(1,5000), nrow = 5000/2 , ncol = 2) # 1808 MB --> 2178 MB Why does memory increase? # (give the matrix column names) colnames(y) = c("col1", "col2") # 2178 MB --> 1781 MB Why does the size of an object decrease if I assign column labels? ### # example 2: y= matrix(rep(1,5000), nrow = 5000/2 , ncol = 2) 1016 --> 1780 MB y = data.frame(y) # increase from 1780 MB --> 3315 MB ## Why does it take so much extra memory to store this matrix as a data.frame? It is not the object per se (i.e. that data.frames need more memory) because if I use gc() memory size drops to 1387 MB. Does this mean that it may be more memory-efficient not to use any data.frames but matrices only? etc. This puzzles me a lot. From my experience these effects are also accentuated for larger objects. As an anecdotal comparison: I also used Stata in my last project due to these memory problems and I could do a lot of variable manipulations of the same (!) data with significant (I am talking about GB) less memory needed. Best, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] memory allocation in R
Dear R community, I was observing a memory issue in R (latest 64bit R version running on a win 7 64 bit system) that made me curious. I kept track of the memory f my PC allocated to R to calculate + keep several objects in the workspace. If I then save the workspace, close R, and open the workspace again, less memory is allocated to keep the same set of variables into the workspace. For my case, the reduction in memory size was quite significant (approx. 2 GB). Does anyone know why R behaves in this manner - put differently: What does R keep in the workspace beyond the objects before I close R? Can I induce the reduction in memory without the need to close R? Thanks for an email! Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to apply several math/logic operations on columns/rows of a matrix
Dear R-Fans, The more I work with matrices (e.g., data.frames) the more I think it would be helpful to have functions to apply (several!) mathematical and/or logical operators column- or row-wise in a matrix. I know the function apply() and all its derivates (e.g., lapply) but I think this does not help for solving (e.g.) the following task: assume there is a 3x3 matrix: 124 453 134 How do I find - for each column separately - the position of the column's minimum without using loop commands, i.e.: I could extract each column in a loop and use something like: for (loopColumn in 1 : 3){ extractedColumnVector = myMatrix[, loopColumn] position = which(extractedColumnVector == min (extractedColumnVector ) ) print(position) } I think that there should be something simpler out there to handle these kinds of tasks (maybe there is and I just don't know but I checked several R books and could not find a command to do this). It would be great to have a function in which it is possible to define a sequence of commands that can be applied column/row-wise. Thanks for a hint, Marc -- Dipl.-Psych. Marc Jekel MPI for Research on Collective Goods Kurt-Schumacher-Str. 10 D-53113 Bonn Germany email: je...@coll.mpg.de phone: ++49 (0) 228 91416-852 http://www.coll.mpg.de/team/page/marc_jekel-0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] file size plot pdf
Dear R-Fans, I have been lately working on some plots in R that I save as pdf via the pdf() command. I have realized that when I open those files in Adobe and then re-save it within Adobe ("save as..."), the size of the pdf files decreases rapidly (e.g., from 4mb to 1mb). This can also be observed for smaller pdf files (but not as drastically). Does anyone know if Adobe somehow compresses pdf files + if it is possible to already do this within R (the quality of the pdfs is the same as far as I can judge from perceptual inspection). Thanks for a hint, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sensitivity logical operators in R
Hi again, I have checked the same code (see below) using MATLAB. It produces the same error (i.e., equal numbers are evaluated as unequal). Do I miss something? Thanks for help! Marc Marc Jekel schrieb: Hello R Fans, Another question for the community that really frightened me today. The following logical comparison produces a "false" as output: t = sum((c(.7,.69,.68,.67,.66)-.5)*c(1,1,-1,-1,1)) tt = sum((c(.7,.69,.68,.67,.66)-.5)*c(1,-1,1,1,-1)) t == tt This is really strange behavior. Most likely this has something to do how R represents numbers internally and the possible sensitivity of a computer? Does anyone know when this strange behavior occurs and how to fix it? Thank you all! This list is pleasure!!! Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sensitivity logical operators in R
Hello R Fans, Another question for the community that really frightened me today. The following logical comparison produces a "false" as output: t = sum((c(.7,.69,.68,.67,.66)-.5)*c(1,1,-1,-1,1)) tt = sum((c(.7,.69,.68,.67,.66)-.5)*c(1,-1,1,1,-1)) t == tt This is really strange behavior. Most likely this has something to do how R represents numbers internally and the possible sensitivity of a computer? Does anyone know when this strange behavior occurs and how to fix it? Thank you all! This list is pleasure!!! Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] which operating system + computer specifications lead to the best performance for R?
Dear R Fans, I have the opportunity to buy a new computer for my simulations in R. My goal is to get the execution of R code as fast as possible. I know that the number of cores and the working memory capacity are crucial for computer performance but maybe someone has experience/knowledge which comp specifications are especially crucial (especially in relation to R). Is there any knowledge on the performance of R for different operating systems (Linux, Win, Mac etc.) resp. is performance dependent on the operating system at all? Even small differences in performance (i.e., speed of calculations) matter for me (quite large datasets + repeated calculations etc.). Thank you for any hint, it is appreciated! Best, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R performance
Dear R Fans, I was recently asking myself how quick R is in code execution. I have been running simulations lately that need quite a time for execution and I was wondering if it is reasonable at all to do more computational extensive projects with R. Of course, it is possible to compare execution time for the same code written in several languages but maybe someone has some experience on the subject? Thanks for a reply, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] r bug (?) display of data
Hi R Fans, I stumbled across a strange (I think) bug in R 2.9.1. I have read in a data file with 5934 rows and 9 columns with the commands: daten = data.frame(read.table("C:/fussball.dat",header=TRUE)) Then I needed a subset of the data file: newd = daten[daten[,1]!=daten[,2],] --> two values do not meet the logical specification and are dropped. The strange thing about it: When I print the newd in the R Console, the output still shows 5934 rows. When I check the number of rows with NROW(newd) , I get 5932 as output. When I print newd[5934, ], I get NAs. When I print newd[5932, ] I get the row that is listed in line 5934 when I just type in newd. This is totally crazy! Has anyone had the same problem? Thanks for a post. Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] small numbers
Dear R Fans, I have a simple probem but cannot find any reference to the soultion. I want to do calculations with small numbers (for max likelihood estimations). The lowest value R is storing by default is 1*10^-323, a smaller numer like 1*10^-324 is stored as a 0. How can I circumvent this problem? Is there a way to define how small a number can be in R. Thanks for a reply in advance, Marc -- Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.