Now that we know what egen is, the answers are one-liners in R: # Make up some data vasdat <- matrix ( sample ( 1:100, 3000, replace = TRUE ), ncol = 3 )
# Use apply for each ( MARGIN = 1 means rows, 2 means columns ) anycountresult <- apply ( vasdat, MARGIN = 1, FUN = function ( x ) sum ( x %in% 1:10 ) ) rowtotalresult <- apply ( vasdat, MARGIN = 1, FUN = sum ) # Combine results with original data egentyperesults <- cbind ( vasdat, anycountresult , rowtotalresult) # Display first ten rows of the data head ( egentyperesults , 10 ) ----- Original message ----- From: "David Winsemius" <dwinsem...@comcast.net> To: "Stas Kolenikov" <skole...@gmail.com> Cc: "r-help@r-project.org" <r-help@r-project.org> Date: Thu, 16 Apr 2009 13:39:44 -0400 Subject: Re: [R] Equivalent to Stata egen Terse is OK by me as long as I get told what goes in (allowable data types, argument names and effects) and what comes out. What seemed to be lacking in that Stata doc for egen was a description of the purpose or behavior and then could find no description of the values produced. Perhaps it is because Stata has an approach that everything is a rectangular array? Is everything assumed to create a new column of data as in SAS? At any rate it looked to this casual non-user, reading that document, that egen creates a new variable aligned with its argument variables by applying various functions within groupings. That is pretty much what ave does. "ave" is not restricted to mean as a functional argument. As I said it was a guess. The texts I used to get up to speed in R are several downloaded from the Contributed documents (including anything written by Venables), V&R MASS v 2, Harrell's RMS, Sarkar's Lattice, Chambers&Hastie SMiS and reading a lot of Q&A on this list. -- David Winsemius On Apr 16, 2009, at 11:57 AM, Stas Kolenikov wrote: > http://www.stata.com/help.cgi?egen -- it creates new variables dealing > with some special relatively non-standard tasks that don't boil down > to a one-line arithmetic expressions. For that reason, there will be > no equivalent to -egen- in general, as it has so many functions that > are so different. -rowtotal- is of course just a shorthand for sum(), > except for treatment of missing values ( ifelse(is.na(x),0,x ). But > -anycount- is a moderately complicated double cycle over variables and > list of values (40 lines of underlying Stata code, including parsing > and labeling the resulting variables)... which will probably become a > triple R cycle including the cycle over observations, although the > latter can probably be avoided. > > Yes, R documentation looks exteremely terse to me as a regular Stata > user. I am used to seeing the concpets explained well, even in the > help files, and certainly more so in the shelved books. As every > option and every part of the syntax is devoted at least three to five > sentences, and the most common uses are exemplified, I can usually > figure out how to run a particular task relatively quickly. (The data > management tricks, which is what Peter was asking about above, are > probably an exception: you either know them, or you don't. In this > example, I don't know the corresponding R tricks, although I can > probably brute force the solution if I needed to.) The fraction of > commands in R that I personally have been coming across that are > comparably well documented is about a quarter. For other, it is either > a guesswork+CRANning+googling around or "Forget it, I'll just go back > to Stata to do it" after a few futile attempts. May be I just don't > know where to look for the good stuff, but it is certainly outside R > as a package+its documentation. > > On 4/15/09, David Winsemius <dwinsem...@comcast.net> wrote: >> Peter Kraglund Jacobsen <peter <at> kraglundjacobsen.dk> writes: >> >>> >>> What are the R equivalents to the Stata command egen? >>> >>> egen temp = anycount(t0vas t30vas t60vas t120vas t240vas t360vas), >>> values(0,1,2,3,4,5,6,7,8,9,10) >>> egen temp2 = rowtotal(t0vas t30vas t60vas t120vas t240vas t360vas) >>> >> >> >> And people call R documentation cryptic! As far as I can tell the >> corresponding >> function would be ave, but that is only a guess since there really >> is not much >> help regarding egen's purpose from the voluminous Stat documentation. >> >> >> -- >> David Winsemius >>> ______________________________________________ >>> R-help <at> r-project.org mailing list >> >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > -- > Stas Kolenikov, also found at http://stas.kolenikov.name > Small print: I use this email account for mailing lists only. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.