Re: [R] Matching rows in a Data set? I'm Stuck!!

Marc Schwartz Wed, 03 Mar 2010 08:26:27 -0800

On Mar 3, 2010, at 7:24 AM, BioStudent wrote:

> 
> Thanks!
> 
> I'm just trying to do it now but having issues with memory...
> 
> test <- merge(file1, file2, by.x = "col1")
> 
> will this give me the output I was hoping for
> 
> ID VALUE1 VALUE2
> 
> ?
> 
> Thanks



If you are going to use 'by.x' then you also need to use 'by.y' so that merge 
knows which column(s) to use in each data set for the matching. Otherwise, 
using my original example with 'by', the presumption is that the same column 
name is available in both datasets.

You can use multiple column names in both datasets to define data combinations 
that result in a unique one-to-one row pairing. The result will also depend 
upon the settings of 'all', 'all.x' and 'all.y'. Review the help file for 
merge(). The default behavior (all = FALSE) only returns the rows that match 
between the two datasets.

If the files are large and you are having memory allocation problems, then you 
basically have three choices:

1. Increase the amount of RAM that you have in the computer, which is limited 
if you are on a 32 bit OS.

2. Move to a 64 bit version of R on a 64 bit OS with sufficient RAM in the 
computer.

3. Perform your data management tasks using an appropriate database 
application, rather than in R. This can be done completely in the database and 
then export to R, or you can access the database from within R using one of the 
several methods available (eg. ODBC). See the R Import/Export Manual at 
http://cran.r-project.org/doc/manuals/R-data.html.

HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matching rows in a Data set? I'm Stuck!!

Reply via email to