Re: [R] testing whether two character vectors contain (the same) items in the same order

Federico Calboli Fri, 07 Aug 2015 00:25:41 -0700

> On 7 Aug 2015, at 01:59, Bert Gunter <bgunter.4...@gmail.com> wrote:
> 
> Boris:
> 
> You may be right, but it seems like esp to me based on the op's 
> non-description of likelihood of coming from the same noisy process. My 
> response would be: seek local statistical help, as your replies indicate a 
> good deal of statistical confusion.
> 
> Cheers,
> Bert


Bert,

as this is R-help and not cross-validated I am looking for a precanned function 
that would test whether the order of characters in two character vectors comes 
from the same (noisy) process.  I would thus expect you to say something on the 
lines of:

function X uses method Y to do something like that
function W uses method Z to do something like that
…

look into those, figure out exactly what you are testing and use the most 
appropiate function.  

The whys and wherefores are for me to deal with, I just want to know whether 
someone has built a function that does, or seems to do, what I asked for.  As I 
said, this is R-help, and I seek help for R use.

I do concede that my original question might have left many wondering, but I 
guess my reply to Boris would have cleared any doubts.  I am therefore puzzled 
by the great deal of confusion on your part in understanding the purpose of my 
question and, in general, of this list.

Best wishes

F


> 
> 
> 
> On Thursday, August 6, 2015, Boris Steipe <boris.ste...@utoronto.ca> wrote:
> You are looking for what is known as the "Cayley distance" between vectors - 
> an edit distance that allows only transpositions. RSeek mentions PerMallows 
> (https://cran.r-project.org/web/packages/PerMallows/PerMallows.pdf) and 
> Rankluster 
> (https://cran.r-project.org/web/packages/Rankcluster/Rankcluster.pdf) as 
> packages that support work with Cayley distances. It seems to me that 
> distCayley() in Rankcluster does what you want. From the examples:
> 
> x=1:5
> y=c(2,3,1,4,5)
> distCayley(x,y)
> 8
> 
> 
> Cheers,
> Boris
> 
> 
> 
> 
> 
> On Aug 6, 2015, at 9:51 AM, Federico Calboli <federico.calb...@helsinki.fi> 
> wrote:
> 
> >>
> >> On 6 Aug 2015, at 15:40, Bert Gunter <bgunter.4...@gmail.com> wrote:
> >>
> >> Define "goodness of match" .  For exact matches, see ?"==" , all.equal, 
> >> etc.
> >
> > Fair point.  I would define it as a number that tells me how likely it is 
> > that the same (noisy) process produced both lists.
> >
> > BW
> >
> > F
> >
> >
> >
> >
> >>
> >> Bert
> >>
> >> On Thursday, August 6, 2015, Federico Calboli 
> >> <federico.calb...@helsinki.fi> wrote:
> >> Hi All,
> >>
> >> let’s assume I have a vector of letters drawn only once from the alphabet:
> >>
> >> x = sample(letters, 15, replace = F)
> >> x
> >> [1] "z" "t" "g" "l" "u" "d" "w" "x" "a" "q" "k" "j" "f" "n" “v"
> >>
> >> y = x[c(1:7,9:8, 10:12, 14, 15, 13)]
> >>
> >> I would now like to test how good a match y is for x.  Obviously I can 
> >> transform the letters in numbers and use a rank test, but I was left 
> >> wondering whether this is the only solution and whether there are more 
> >> appropriate solutions that are already implemented in R (I am not going to 
> >> reinvent the wheel if I can avoid it).
> >>
> >> BW
> >>
> >> F
> >>
> >>
> >> --
> >> Federico Calboli
> >> Ecological Genetics Research Unit
> >> Department of Biosciences
> >> PO Box 65 (Biocenter 3, Viikinkaari 1)
> >> FIN-00014 University of Helsinki
> >> Finland
> >>
> >> federico.calb...@helsinki.fi
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >> --
> >> Bert Gunter
> >>
> >> "Data is not information. Information is not knowledge. And knowledge is 
> >> certainly not wisdom."
> >>   -- Clifford Stoll
> >>
> >
> >
> > --
> > Federico Calboli
> > Ecological Genetics Research Unit
> > Department of Biosciences
> > PO Box 65 (Biocenter 3, Viikinkaari 1)
> > FIN-00014 University of Helsinki
> > Finland
> >
> > federico.calb...@helsinki.fi
> >
> > ______________________________________________
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> -- 
> Bert Gunter
> 
> "Data is not information. Information is not knowledge. And knowledge is 
> certainly not wisdom."
>    -- Clifford Stoll
> 


--
Federico Calboli
Ecological Genetics Research Unit
Department of Biosciences
PO Box 65 (Biocenter 3, Viikinkaari 1)
FIN-00014 University of Helsinki
Finland

federico.calb...@helsinki.fi

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] testing whether two character vectors contain (the same) items in the same order

Reply via email to