[R] Clinical Trial data sets in public domain?
Is anybody using R to do analysis of clinical trial datasets that have been put in the public domain (which are super hard to find). Not only a single data table, but the actual database, with a handful of data tables with one-to-one or many-to-one relationships? [ For example, "Adverse Events" and "Patient Info" are two datasets with a many-to-one relationship, the "Patient Info" dataset has precisely one row for each patient who received a dose of study drug.] Robert Wilkins [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data cleaning & Data preparation, what do R users want?
Dominik (and others) If it is indeed still the biggest paint point, even in 2017, then maybe we can do something about that, with more efforts at different user interface design and try-outs with them on specialized datasets. [ The fact that in some specialties, such as clinical trials, for example, getting access to public domain datasets (and not having to use a tiny "toy" dataset, which nobody will pay attention to, does make it harder]. It would help if academia (both comp-sci and statistics departments) would support those who invest resources in drafting and test-driving new product designs. If, in the year 2017, it is still a big pain point, doesn't that make sense. More speculative work in statistical programming language design has not been a priority in academia since before 1980. On Thu, Nov 30, 2017 at 4:11 AM, Dominik Schneider < dominik.schnei...@colorado.edu> wrote: > I would agree that getting data into R from various sources is the biggest > pain point. Even if there is an api, the results are not always consistent > and you have to do lots of dimension checking to get it right. Or there > isn't an open api at all and you have to hack it by web scraping or > otherwise- http://enpiar.com/2017/08/11/one-hour-package/ > > On Thu, Nov 30, 2017 at 1:00 AM, Jim Lemon wrote: > >> Hi again, >> Typo in the last email. Should read "about 40 standard deviations". >> >> Jim >> >> On Thu, Nov 30, 2017 at 10:54 AM, Jim Lemon wrote: >> > Hi Robert, >> > People want different levels of automation in the software they use. >> > What concerns many of us is the desire for the function >> > "figure-out-what-this-data-is-import-it-and-get-rid-of-bad-values". >> > Such users typically want something that justifies its use by being >> > written by someone who seems to know what they're doing and lots of >> > other people use it. One advantage of many R functions is their >> > modular construction. This encourages users to at least consider the >> > steps that are taken rather than just accept what comes out of that >> > long tube. >> > >> > Take the contentious problem of outlier identification. If I just let >> > the black box peel off some values, I don't know what I have lost. On >> > the other hand, if I import data and examine it with a summary >> > function, I may find that one woman has a height of 5.2 meters. I can >> > range check by looking up the Guinness Book of Records. It's an >> > outlier. I can estimate the probability of such a height. Hmm, about >> > 4 standard deviations above the mean. It's an outlier. I can attempt a >> > Sherlock Holmes. "Watson, I conclude that an imperial measure (5'2") >> > has been recorded as a metric value". It's not an outlier. >> > >> > The more R gravitates toward "black box" functions, the more some >> > users are encouraged to let them do the work.You pays your money and >> > you takes your chances. >> > >> > Jim >> > >> > >> > On Thu, Nov 30, 2017 at 3:37 AM, Robert Wilkins >> wrote: >> >> R has a very wide audience, clinical research, astronomy, psychology, >> and >> >> so on and so on. >> >> I would consider data analysis work to be three stages: data >> preparation, >> >> statistical analysis, and producing the report. >> >> This regards the process of getting the data ready for analysis and >> >> reporting, sometimes called "data cleaning" or "data munging" or "data >> >> wrangling". >> >> >> >> So as regards tools for data preparation, speaking to the highly >> diverse >> >> audience mentioned, here is my question: >> >> >> >> What do you want? >> >> Or are you already quite happy with the range of tools that is >> currently >> >> before you? >> >> >> >> [BTW, I posed the same question last week to the r-devel list, and was >> >> advised that r-help might be a more suitable audience by one of the >> >> moderators.] >> >> >> >> Robert Wilkins >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> __ >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data cleaning & Data preparation, what do R users want?
Christopher, OK, well what about a range of functions in an R package that automatically, with very little syntax, pulls in data from a variety of formats (CSV, SQLite, and so on) and converts them to an R data frame. You seem to be pointing to something like that. Something like that, in some form or another, probably already exists, though it might be either imperfect (not as user-friendly as possible) or not well publicised, or both. Or another tangent: your co-workers are not going to stop using Excel, whether you like it or not, and many end-users are stuck in the exact same position as you (co-workers who deliver the data in Excel). I will guess that data stored in Excel tends to be dirty in somewhat predictable ways. (And again, those other end-user's coworkers are not going to change their behaviour). And so: a data munging tool that makes it as easy as possible to clean up the data in Excel spreadsheets and export them to R data frames. One prerequisite: an understanding of what tends to go wrong with data with Excel ( the data in Excel tends to be dirty, but dirty in what way?). Thank you for your response Christopher. What state are you in? On Wed, Nov 29, 2017 at 11:52 AM, Christopher W. Ryan wrote: > Great question. What do I want? I want my co-workers to stop using Excel > spreadsheets for data entry, storage, and sharing! I want them to > understand the value of data discipline. But alas . . . . > > I work in a county health department in the US. Between dplyr, stringr, > grep, grepl, and the base R read() functions, I'm doing OK. > > I need to learn more about APIs, so I can see if I can make R directly > grab data from, e.g. our state health department sources. My biggest > hassle is having to download a data file, save it somewhere, and then > open R and read it in. I'd like to be able to do it all in R. Would make > the generation of recurring reports easier. > > --Chris Ryan > > Robert Wilkins wrote: > > R has a very wide audience, clinical research, astronomy, psychology, and > > so on and so on. > > I would consider data analysis work to be three stages: data preparation, > > statistical analysis, and producing the report. > > This regards the process of getting the data ready for analysis and > > reporting, sometimes called "data cleaning" or "data munging" or "data > > wrangling". > > > > So as regards tools for data preparation, speaking to the highly diverse > > audience mentioned, here is my question: > > > > What do you want? > > Or are you already quite happy with the range of tools that is currently > > before you? > > > > [BTW, I posed the same question last week to the r-devel list, and was > > advised that r-help might be a more suitable audience by one of the > > moderators.] > > > > Robert Wilkins > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data cleaning & Data preparation, what do R users want?
R has a very wide audience, clinical research, astronomy, psychology, and so on and so on. I would consider data analysis work to be three stages: data preparation, statistical analysis, and producing the report. This regards the process of getting the data ready for analysis and reporting, sometimes called "data cleaning" or "data munging" or "data wrangling". So as regards tools for data preparation, speaking to the highly diverse audience mentioned, here is my question: What do you want? Or are you already quite happy with the range of tools that is currently before you? [BTW, I posed the same question last week to the r-devel list, and was advised that r-help might be a more suitable audience by one of the moderators.] Robert Wilkins [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Best way to study internals of R ( mix of C, C++, Fortran, and R itself)?
How difficult is it to get a good feel for the internals of R, if you want to learn the general code base, but also the CPU intensive stuff ( much of it in C or Fortran?) and the ways in which the general code and the CPU intensive stuff is connected together? R has a very large audience, but my understanding is that only a small group have a good understanding of the internals (and some of those will eventually move on to something else in their career, or retire altogether). While I'm at it, a second question: 15 years ago, nobody would ever offer a job based on R skills ( SAS, yes, SPSS, maybe, but R skills, year after year, did not imply job offers). How much has that changed, both for R and for NumPy/Pandas/SciPy ? thanks in advance Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How would you program an Adverse Events statistical table using R code?
A graph != A table. I'm talking about a page full of summary statistics and advanced statistics, with lots of cross categories on the top and left margin of the table, as opposed to a visual display with x-axis and y-axis, which is totally different. (An example of how this is done in another language is available at http://fivetimesfaster.blogspot.com ) For an AE table, you have an N and % column for every treatment group, and for all patients combined. On the right side, a categorical p-value (chi-sq or Fisher's) for every preferred term (every row! forget multiple testing issues, this is what the boss is asking for(it's ad-hoc safety analysis)) There's a row for grand total N for each group. A row for N and % of patients with any event (regardless of body system and preferred term) For each body system, there's a section of rows that include: A row for N and % of patients with any event (this body system) A row for N and % of patients who do NOT have an event( this body system) And , of course, within body system, a row for each preferred term (again N and % for each group , and also the p-value) Body system and preferred term are, of course broad medical category and specific medical category. In the Pharma industry, they use the SAS programming language. Each table often needs several hundred lines of code. Essentially it's a combination of analysis and (visual)-reporting mixed together, with some prerequisite data transformation. (And yes, with this new language, it can be done in under 20 lines of code). I have not seen people discuss attempts to do such things with the R programming language, and how successful such attempts have been. How hard is it, how much code is it? In general, we are talking about a variety of complex, somewhat-nonhomogeneous statistical tables with a variety of different row sections and row categories, and different column sections and column categories, and a mixture of summary statistics and advanced statistics (p-value , least square mean, etc), and sometimes statistics from different statistical procedures on the same page. Robert Wilkins __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Boston/Cambridge -- Statistical Programming Language Technology Breakthroughs
If you are a statistician or researcher working in Boston/Cambridge, and you have a strong interest in breakthroughs in statistical programming language technology, contact me. Robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Statistical Tables Really Fast
a new language that can produce complex statistical tables far faster, with much less code and effort, than any previous statistical programming language. a version that outsources ( gives work to do ) to vilno data transformation and R is already in beta mode, a version that outsources to SAS/BASE and SAS/STAT is not yet in beta mode. http://fivetimesfaster.blogspot.com Robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to loop thru a matrix or data frame , and append calculations to a new data frame?
How do you do a double loop through a matrix or data frame , and within each iteration , do a calculation and append it to a new, second data frame? (So, if your original matrix or data frame is 4 x 5 , then 20 calculations are done, and the new data frame, which had 0 rows to start with, now has 20 rows) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] syntax for estimable(gmodels package) and glht(multcomp package)
Hello, I have a question as to how the syntax for glht(package multcomp) and estimable (gmodels) works, since I'm not getting everything from the documents I've googled so far, especially with models with 2nd order terms. A modestly complex model: 2-way anova with one continuous covariate, no random effects(and no repeated measures) to keep it modestly complex: Y = treatmentgroup + sex + treatmentgroup*sex + weight treatment has 3 levels : "Placebo" , "DrugA" , "DrugB" sex has 2 levels I want to do pairwise comparison(s) for one of the main effects, say "DrugB" - "Placebo" And a pairwise comparison at the cell-wise level, for example: "Female:DrugA" - "Female:Placebo" or "Female:DrugA" - "Male:DrugA" The second request is not ambiguous since it's a difference of two cells, (although the syntax for this request might be simplified if the main first-order effects are constrained to zero ). and suppose the marginal sums of the 2nd order terms sum to zero, both down and across, that should make the first request non-ambigous. Two things: 1: people in the mail list are having difficulties dealing with interaction terms with both functions ( I see from googling ) and the available PDFs don't explicitly deal with these cases. 2: specifying the desired estimate with actual categorical levels in the calling syntax would be really nice: i.e. "Female:DrugA" - "Male:DrugA" , instead of something like ( 0 0 1 -1 0 0 ) , which to me is less intuitive and more prone to error. One of the PDF s on the internet seem to suggest that estimable can do this sort of thing for first order terms, but whether this extends to two-way is not clear. thanks for your time. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] installing R on Ubuntu, can ignore warning messages?
It does, thank you. I was able to understand enough of it to do the install successfully . Still trying to understand the later paragraphs such as install.package() and the r-cran-foo build dependencies. (the site you pointed me to is the same site i did a printout of yesterday to try to do an install, the readme file prints to 3 pages). Is there an easy way to: 1: List the R-related packages and add-ons that are already installed? no point in trying to install what you already got! 2: List the R-related packages and add-ons that are available? Probably a big number of them? Also, for people who try Ubuntu out for the first time could be thrown for a loop by the weird way it handles the root account: https://help.ubuntu.com/community/RootSudo thanks again. On Wed, Oct 14, 2009 at 10:38 PM, Ista Zahn wrote: > Hi, > Instructions for authenticating the cran repositories are here: > http://cran.r-project.org/bin/linux/ubuntu/ > > r-base comes with whatever the base R libraries are (stats, graphics > etc.). I don't know if MASS in particular is in base because I don't > use it directly. > > As far as I know it's safe to ignore the warnings, but they annoy me > so I always following the instructions linked above. > > The list of packages regularly updated in the cran repo are also > listed on the webpage linked above. > > A couple of further tips: > 1) I usually install packages with sudo aptitude install r-cran-xxx > and then make sure they are up-to date by running update.packages() in > R. > 2) You can also install packages using the regular install.packages() > in an R session. > > Hope that helps, > -Ista > > On Wed, Oct 14, 2009 at 10:11 PM, robstdev wrote: >> Installing R on Ubuntu 8.10, >> ( using sudo apt-get install r-base , and using one of the cran sites >> (cran.cnr.berkeley.edu)) >> >> the installation process says something about not having some gpg >> public key and >> "are you sure you want to download non-authenticated stuff [y/n]" (to >> which I answered yes). >> I'm assuming this warning can be ignored? >> >> Also: even though the Ubuntu install and online update did a GCC >> install the other day, the R installation did an update of some GCC >> files, which I thought was odd. Probably I can ignore that too. >> >> Once you've installed R, does that automatically include some data >> examples ( such as that MASS library ? )? >> Or does that require further downloads? >> >> Also, thanks for the previous tips >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Ista Zahn > Graduate student > University of Rochester > Department of Clinical and Social Psychology > http://yourpsyche.org > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installing R on Ubuntu ( 8.10 ) ?
installing on Ubuntu, how to do it and have people found it to be glitchy? which is easier , binary install or from source ? With the source install, are you less likely to have a dependencies issue ? ( Ubuntu does the GCC install seamlessly, but has no mention of R ) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] easy way to find all extractor functions and the datatypes of what they return
Am I asking for too much: for any object that a stat proc returns ( y <- lm( y~x) , etc ) ) , is there a super convenient function like give_all_extractors( y ) that lists all extractor functions , the datatype returned , and a text descriptor field ("pairwisepval" "lsmean" etc) That would just be so convenient. What are my options for querying an object so that I can quickly learn the extractor functions to pull out the data and manipulate it? Will the datatypes returned usually be named vectors and named matrices, indiced by categorical values in the data ( "Male" "Female" "Placebo" "DrugB" etc )? If they are indexed by 1 , 2 , 3 , 4 , it's easier to lose track. thanks a bunch in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing R on Suse 11.1 , cannot figure it out
I believe i did last night: a cran site in Pittsburgh, with a "Install" link that I believe you are referring to, it just didnt work, unless you are referring to a different web site. The r site, eventually leads to a list of install choices ( download from different locations , such as michigan, pittsburgh, etc) Your example is Iowa? Not having GCC preinstalled ( C and Fortran ) might be a factor. When you install LInux, it should just install GCC , just like that. I mean, that's just wrong. this blog post sheds some light maybe: http://www.flexbeta.net/main/articles.php?action=show&id=70&perpage=1&pagenum=5 When you install from source ( which I can't , because I can't figure out how to install GCC) , does the source install have binary dependencies? On Wed, Oct 7, 2009 at 9:10 PM, Cedrick W. Johnson wrote: > see below: > > Robert Wilkins wrote: >> >> Can't figure out how the install works, it is certainly not automatic. >> Also , the "Install" option on the R web site for Suse 11.1 does not work. >> And the install software native to Suse, cannot figure out. >> >> Does Suse have more problems installing software than Fedora or Ubuntu? >> > > Did you try installing the RPMS listed under your favorite CRAN mirror? > (note, I didn't use the "install" links for the Readme file, I think you > could grab these using 'wget' > > http://streaming.stat.iastate.edu/CRAN/bin/linux/suse/11.1/RPMS/i586/R-base-2.9.0-2.1.i586.rpm > -and devel- > http://streaming.stat.iastate.edu/CRAN/bin/linux/suse/11.1/RPMS/i586/R-base-devel-2.9.0-2.1.i586.rpm > > You may also have to compile R from source... > >> Or is this a hassle for any Linux distro? And Windows? >> > > My windows installs have been relatively hassle-free (5 workstations). I > just finished setting up a small cluster (3) of ubuntu R instances in under > 30 minutes. So, your mileage may vary. I've found Ubuntu to be rather simple > to install R instances. > > Hope this helps > c > > > = > *Cedrick W. Johnson* > aolim) cedrickjcvgr > www.cedrickjohnson.com > *New York - Chicago* > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] To hell with OpenSuse, ditch it and go to Ubuntu
this blog entry http://www.viggie.com/blog/software/opensuse-ubuntu-usage-experience , if credible , would seem to suggest that there is no good reason to choose Suse. I really don't have time for such nonsense, maybe I'll just reinstall as Ubuntu. Also, noticed that GCC was not installed when Suse installed. That's just weird, GCC is part of Linux. And if you try to install GCC, the available options , are , again , thoroughly confusing. It appears the Software-Install features of Suse are just not very robust. Is Fedora any better? Do you think that blog post is accurate in comparing Ubuntu and Suse? Since Mandriva apparently has little market share or support in the US, I guess I won't do with that. So it's Ubuntu or Fedora. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installing R on Suse 11.1 , cannot figure it out
Can't figure out how the install works, it is certainly not automatic. Also , the "Install" option on the R web site for Suse 11.1 does not work. And the install software native to Suse, cannot figure out. Does Suse have more problems installing software than Fedora or Ubuntu? Or is this a hassle for any Linux distro? And Windows? Rob [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R on Linux, and R on Windows , any difference in maturity+stability?
Will R have more glitches on one operating system as opposed to another, or is it pretty much the same? robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Gentleman and Ihaka's integrity in question
It does look like Gentleman and Ihaka not only lied to the New York Times, but also to the New Zealand Herald and who knows who else. This is disgusting. The R programming language is the S programming language, and Gentleman and Ihaka are not the ones who designed it. http://thenewyorktimesissloppy.blogspot.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] What does R have for age-adjusted survey analysis?
A procedure that , after adjusting for sampling weights, also explicitly does an age adjustment to conform with an age distribution of an older census? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survey statistics, rate/proportions with standard errors
what does R have to compare with , say , proc surveymeans, estimate survey means/proportions with standard errors, using Taylor methods? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] AT&T Researchers and the New York Times
Is anyone in the leadership of the R-project going to contact the New York Times and clarify that the article gave remarkably short shrift to the people who designed the user interface for R, to a large extent AT&T researchers from an earlier generation? It would be the appropriate thing to do. The R team did not develop the user interface for R, the designers of the S programming language did. The layman reader of Vance's article will get the impression that R is a brand new invention, which is misleading and unfair. Gentleman and Ihaka should try harder to give credit where credit is due. And by the way, ARE YOU GUYS EVER GOING TO FIX your mailing list platform? It is extremely user-unfriendly and a technological clunk. The mailing lists for SAS, Python , and others (UseNet) may not be a user-interface-work-of-genius, but they are far superior to the R mailing list. What a clunk. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] The AT&T researchers and the New York Times
Is anyone in the leadership of the R-project going to contact the New York Times and clarify that the article gave remarkably short shrift to the people who designed the user interface for R, to a large extent AT&T researchers from an earlier generation? It would be the appropriate thing to do. The R team did not develop the user interface for R, the designers of the S programming language did. The layman reader of Vance's article will get the impression that R is a brand new invention, which is misleading and unfair. Gentleman and Ihaka should try harder to give credit where credit is due. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ashlee Vance's article on R in the New York Times
Ashlee Vance's article on R in the New York Times. This is typical of the New York Times. Because they get to coast on the prestige and reputation of their brand , they have a history of just this sort of journalistic sloppiness. Whether it's the author or the editor at fault doesn't really matter, they do this screw-up all the time. Look, if you write an article on the first page of the business section, you're not just presenting yourself as a writer or entertainer, you're presenting yourself as a journalist, and that implies two commitments: 1: I believe that my writing is true , and as fair and balanced as appropriate in the context. 2: I've invested the time in research and fact-checking so that point #1 actually has credibility. Vance clearly fails on point #2. He just didn't do his homework. And as I've seen over the years, this is typical for NYT contributors. That's complacency. A bit like SAS Institute - NYT is overly reliant on it's brand name. First of all, the third paragraph is a falsehood. I'm not saying Vance is lying. I'm saying he's lazy. A couple of hours of research, and he could have corrected that. If you find computer programming to be tedious, unpleasant, or quite difficult, then R is the wrong software for you. R has a reputation for having a tougher learning curve than the SAS programming language. Even if you disagree, neither is appropriate for people who don't have the time and patience to study programming languages. Vance's article is also deeply misleading , he gives the wrong impression of where R actually came from, and who deserves credit for what. It's especially glaring given that he does briefly mention R's precursor, S. Yet, funny that, he neglects to mention that S and R basically use the same user interface ( the same programming language ). Hey Vance, um, that's a big oversight. R is a quality software package, with years of development and debugging, and substantial documentation, and diverse and reliable statistical function libraries. The R project team deserves a great deal of credit for this. But they don't deserve all of the credit. A great deal of the R software product was already achieved before the R team ever came along. There is a tendency to poo-poo the blood and sweat that go into the design of the user interface. The choices made when designing the user interface of any data analysis tool are critical, whether GUI or language. Assuming the CPU is not overloaded, which is often the case, it is the user interface that makes the difference between a piece of cake , and hours lost coding what should have been a routine task. Well, Gentleman and Ihaka did not design the user interface for R. AT&T researchers did, during the cold war. It's possible that a few employees at proprietary software companies also contributed. It might have been largely financed by American taxpayers, because there were a lot of backroom deals during the cold war, and AT&T was typically in the thick of it. The user interface for R, otherwise known as the S programming language has the same origins as C and Unix. Some R promoters point out that R has lexical scope and lots of Scheme goodness. ( and what widespread programming language today does not have lexical scope? ). But other R promoters point out that programs in S-Plus usually work in R, and vice-versa. Well, in that case, then it's the same damn programming language! Quite likely, the R founders were careful to point this out in their interviews with Vance. Even if they forgot, minutes of research on Vance's part would have told him that. The New York Times - sloppy as usual. More like an advertisement than a bona fide article. And the upshot of this , in the outlook for statistical software, is that regarding the strengths and (considerable) limitations of the three classical statistical programming languages ( S, SAS, SPSS) , R really doesn't change anything at all. I definitely like the pricetag though. And that does not mean that R cannot achieve a quality and reliability comparable to S-Plus and SAS, not withstanding Milley's snide comment. But if you want to attack the chronic and painful productivity problems with data preparation and statistical table production, you need to go beyond R and SAS. You have to develop new user interfaces, and that is very risky, and takes years of technical work and marketing. And, to be honest, that is not what open source developers are willing to do. In the majority of software categories, including specialized languages( such as statistical), open source developers are not motivated to develop user interfaces that make a ground-breaking difference in the user's productivity level. One big, and crucial exception is the category of all-purpose programming languages. Thousands of open source developers go to bed dreaming of being the next Larry Wall. Thankfully, we have Ruby and Python as a result. [[alternative HTML version deleted]]
[R] Estimating the standard error when you have sampling weights.
Hi, Where can I find information ( freely available on the Internet , and also books or other sources ) on how having sampling weights changes the calculation of the standard error (of means and proportions)? How good is R for this type of procedure? And SAS? thanks Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.