[R] problem merging two data sets ( one with a header and one without)

2008-08-21 Thread kayj

I have two set of data, Data1 and Data2 . Data1 has a header and Data2 does
not.  I would like to merge the two data sets after removing some columns
from data2 .

I am having a problem merging so I had to write and read final data and
specify the “header=F” so the merge can be done by”V1”. Is there a way to
avoid this step. The problem is when I do cbind the FinalData has different
column names 



Data1<-read.table("data1.txt", sep='\t', header=F, stringsAsFactors=F)

Data2<-read.table("data2.txt", sep='\t', header=T, stringsAsFactors=F)

P1<-cbind(Data2[,2])
P2<-cbind(Data2[,5:30])
FinalData<-cbind(P1,P2)
write.table(FinalData ,file="FinalData.txt", sep='\t', quote=F, col.names=F,
row.names=F)

Data3<-read.table("FinalData.txt", sep='\t', header=F, stringsAsFactors=F)
m<-merge(Data1,Data3, by="V1")


-- 
View this message in context: 
http://www.nabble.com/problem-merging-two-data-sets-%28-one-with-a-header-and-one-without%29-tp19090134p19090134.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem merging two data sets ( one with a header and one without)

2008-08-21 Thread Don MacQueen
merge() has by.x and by.y arguments. If you use them, you can merge 
data frames that have different column names. You can specify columns 
by name or by number. This is mentioned in the help for merge.


Try

   merge(Data1, Data2, by.x=1, by.y=2)

which will keep all of the columns in Data2, or

   merge(Data1, Data2[ ,c(2,5:30)] ,  by=1 )

if you must remove columns 1, 4, and 5 from Data2.


Alternately, since merge() works on common variable names, all you 
have to do is make sure that the single column you want to use for 
the merge has the same name in both of them, and that it is the only 
column with the same name in both. Thus, another way to do the merge 
would be


  names(Data2)[2] <- 'V1'
  merge( Data1, Data2)

or

  names(Data2)[2] <- 'V1'
  merge( Data1, Data2[ , c(2,5:30)] )


While I'm at it, using cbind() is unnecessary. You can replace

  P1<-cbind(Data2[,2])
  P2<-cbind(Data2[,5:30])
  FinalData<-cbind(P1,P2)

with

   FinalData <- Data2[, c(2,5:30)]

But even more unnecessary is the cbind() in

  P1<-cbind(Data2[,2])

all that is needed is

  P1<- Data2[,2]


I personally think you're better off if you do not change the names 
of FinalData, but if you do, it's easier this way:


   names(FinalData) <- paste('V',1:27,sep='')

Or more generally

   names(FinalData) <- paste('V', seq(ncol(FinalData)), sep='')


By the way, although your text file data2.txt does not have a header, 
your dataframe Data2 does have a "header". That is, it has column 
names V1, V2, and so on.


-Don

At 7:59 AM -0700 8/21/08, kayj wrote:

I have two set of data, Data1 and Data2 . Data1 has a header and Data2 does
not.  I would like to merge the two data sets after removing some columns
from data2 .

I am having a problem merging so I had to write and read final data and
specify the "header=F" so the merge can be done by"V1". Is there a way to
avoid this step. The problem is when I do cbind the FinalData has different
column names



Data1<-read.table("data1.txt", sep='\t', header=F, stringsAsFactors=F)

Data2<-read.table("data2.txt", sep='\t', header=T, stringsAsFactors=F)

P1<-cbind(Data2[,2])
P2<-cbind(Data2[,5:30])
FinalData<-cbind(P1,P2)
write.table(FinalData ,file="FinalData.txt", sep='\t', quote=F, col.names=F,
row.names=F)

Data3<-read.table("FinalData.txt", sep='\t', header=F, stringsAsFactors=F)
m<-merge(Data1,Data3, by="V1")


--
View this message in context: http:// www. 
nabble.com/problem-merging-two-data-sets-%28-one-with-a-header-and-one-without%29-tp19090134p19090134.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https:// stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
-
Don MacQueen
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062
[EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.