Re: [R] simple matching with R

Jeffrey Robert Spies Fri, 28 Sep 2007 10:25:17 -0700

> The next problem for me is now to deal with the NAs. I thought  
> perhaps it is possible to exclude the variable from the row  
> comparison if in one of the rows is an NA?


If you exclude the NAs in one dataset, you'll need to exclude the  
exact same row in the other dataset.  The question to ask here is:   
qualitatively, is a row of NAs and a row of binary values completely  
dissimilar?  I assumed so in my first specification of the solution,  
but perhaps that is not the case.  And what about a row with just one  
NA.  Do you exclude the whole row, impute a value, or count it in  
your measure of dissimilarity.  You could also choose to penalize NAs  
more than a mismatch, if you'd like.  This all depends on your data.

> Furthermore it would be useful than to divide the resulting number  
> by the number of used variables for the comparison to get back a  
> number between 0 and 1.

One nice thing about creating a dissimilar function and using apply,  
is that we only need to alter that function:

dissimilar <- function(tRow){
        length(tRow[tRow==FALSE])/length(tRow)
}

You wanted to divide the number of FALSEs by the number of variables  
in the comparison, hence dividing tRow[tRow==FALSE]) by length 
(tRow).  Remember, tRow is each row of the comparison passed from apply.

> Unfortunately I am able to understand what happens if somebody  
> gives me the code but I am not able at the moment to write it by  
> myself. I hope this will change by and by.

The more examples you see and play with, the more you'll understand.

> So I would be very pleased if you could help me once again.
>
> Greetings
>
> Birgit

Cheers,

Jeff.

>
> Am 28.09.2007 um 18:25 schrieb Jeffrey Robert Spies:
>
>> Not sure how you want to handle the NAs, but you could try the
>> following:
>>
>> #start
>> MalVar29_37 <- read.table(textConnection("V1 V2 V3 V4 V5 V6 V7 V8 V9
>> 0  0  0  0  0  1  0  0  0
>> 0  0  0  0  0  1  0  0  0
>> 0  0  0  0  0  1  0  0  0
>> NA NA NA NA NA NA NA NA NA
>> 0  1  0  0  0  1  0  0  0"), header=TRUE)
>>
>> FemVar29_37 <- read.table(textConnection("     V1 V2 V3 V4 V5 V6 V7
>> V8 V9
>> 1  1  0  0  0  0  0  0  0
>> 0  1  0  0  1  1  0  0  0
>> 1  0  0  1  0  0  0  0  0
>> 0  1  0  0  1  0  0  0  0
>> 0  1  0  0  0  0  0  0  0"), header=TRUE)
>>
>> comparison <- MalVar29_37 == FemVar29_37
>>
>> dissimilar <- function(tRow){
>>      length(tRow[tRow==FALSE])
>> }
>>
>> dissimilarity <- apply(comparison, c(1), dissimilar)
>> dissimilarity
>> # finish
>>
>> Variable comparison is an entry by entry comparison, resulting in
>> values of TRUE or FALSE.  I've defined a function dissimilar as the
>> number of FALSEs in a given object (tRow).  Variable dissimilarity is
>> then the application of this dissimilar function for each row of
>> comparison.  In this example, 0 means all of the entries in a row
>> matche, 9 means none of them matched.  You can see the solution here
>> in recipe form: http://www.r-cookbook.com/node/40
>>
>> Hope this helps,
>>
>> Jeff.
>>
>> On Sep 28, 2007, at 11:13 AM, Birgit Lemcke wrote:
>>
>>> Hello!
>>>
>>> I am R beginner and I have a question obout a simple matching.
>>>
>>> I have to datasets that i read in with:
>>>
>>> MalVar29_37<-read.table("MalVar29_37.csv", sep = ";")
>>> FemVar29_37<-read.table("FemVar29_37.csv", sep = ";")
>>>
>>> They look like this and show binary variables:
>>>
>>>      V1 V2 V3 V4 V5 V6 V7 V8 V9
>>> 1    0  0  0  0  0  1  0  0  0
>>> 2    0  0  0  0  0  1  0  0  0
>>> 3    0  0  0  0  0  1  0  0  0
>>> 4   NA NA NA NA NA NA NA NA NA
>>> 5    0  1  0  0  0  1  0  0  0
>>>
>>>      V1 V2 V3 V4 V5 V6 V7 V8 V9
>>> 1    1  1  0  0  0  0  0  0  0
>>> 2    0  1  0  0  1  1  0  0  0
>>> 3    1  0  0  1  0  0  0  0  0
>>> 4    0  1  0  0  1  0  0  0  0
>>> 5    0  1  0  0  0  0  0  0  0
>>>
>>> each with 348 rows.
>>>
>>> I would like to perform a simple matching but only row 1 compared to
>>> row1, row 2 compared to row 2 (paired).......giving back a number as
>>> dissimilarity for each comparison.
>>>
>>> How can i do that?
>>>
>>> Thanks in advance
>>>
>>> Birgit
>>>
>>>
>>>
>>>
>>> Birgit Lemcke
>>> Institut für Systematische Botanik
>>> Zollikerstrasse 107
>>> CH-8008 Zürich
>>> Switzerland
>>> Ph: +41 (0)44 634 8351
>>> [EMAIL PROTECTED]
>>>
>>>
>>>
>>>
>>>
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting- 
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> Birgit Lemcke
> Institut für Systematische Botanik
> Zollikerstrasse 107
> CH-8008 Zürich
> Switzerland
> Ph: +41 (0)44 634 8351
> [EMAIL PROTECTED]
>
>
>
>
>


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple matching with R

Reply via email to