On Wed, May 06, 2009 at 12:22:45AM -0400, Farrel Buchinsky wrote: > Is R an appropriate tool for data manipulation and data reshaping and data > organizing? I think so but someone who recently joined our group thinks not. > The new recruit believes that python or another language is a far better > tool for developing data manipulation scripts that can be then used by > several members of our research group.
I happily use both approaches depending on the original format the data come in: For data that are not in a "well behaved" format and require actual parsing, I tend to use Python scripts for transmogrifying the data into nice and tidy tables (and maybe some very basic filtering). For everything after that I prefer R. I also use Python if the relevant data needs to be harvested and assembled from many differnt sources (e.g. data files + web + databases). Once the data files are easy to read (csv, tab separated, database, ...) and the task is to reshape, filter and clean the data, I usually do it in R. R has true advantages here: - After reading a table into a data frame I can immediatly tell, if all measurements are what they are supposed to be (integer, numeric, factor, boolean) and functions like read.table even do quite some error checking for me (equal number of columns etc.) - Finding out if factors have the right (or plausible) number of levels is easy - Filtering by logical indexing - Powerful and reliable reshaping (reshape package) - Very conveniant diagnostics: str(), dim(), table(), summary(), plotting the data in various ways, ... cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.