Re: [R] testing whether two character vectors contain (the same) items in the same order
And I probably should have included this link: http://journal.r-project.org/archive/2014-1/loo.pdf On 8/8/2015 12:50 PM, Robert Baer wrote: On 8/6/2015 5:25 AM, Federico Calboli wrote: Hi All, let’s assume I have a vector of letters drawn only once from the alphabet: x = sample(letters, 15, replace = F) x [1] "z" "t" "g" "l" "u" "d" "w" "x" "a" "q" "k" "j" "f" "n" “v" y = x[c(1:7,9:8, 10:12, 14, 15, 13)] I would now like to test how good a match y is for x. Obviously I can transform the letters in numbers and use a rank test, but I was left wondering whether this is the only solution and whether there are more appropriate solutions that are already implemented in R (I am not going to reinvent the wheel if I can avoid it). BW F Perhaps install.packages("stringdist") help(package = 'stringdist') -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing whether two character vectors contain (the same) items in the same order
On 8/6/2015 5:25 AM, Federico Calboli wrote: Hi All, let’s assume I have a vector of letters drawn only once from the alphabet: x = sample(letters, 15, replace = F) x [1] "z" "t" "g" "l" "u" "d" "w" "x" "a" "q" "k" "j" "f" "n" “v" y = x[c(1:7,9:8, 10:12, 14, 15, 13)] I would now like to test how good a match y is for x. Obviously I can transform the letters in numbers and use a rank test, but I was left wondering whether this is the only solution and whether there are more appropriate solutions that are already implemented in R (I am not going to reinvent the wheel if I can avoid it). BW F Perhaps install.packages("stringdist") help(package = 'stringdist') -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing whether two character vectors contain (the same) items in the same order
On Aug 7, 2015, at 12:22 AM, Federico Calboli wrote: > >> On 7 Aug 2015, at 01:59, Bert Gunter wrote: >> >> Boris: >> >> You may be right, but it seems like esp to me based on the op's >> non-description of likelihood of coming from the same noisy process. My >> response would be: seek local statistical help, as your replies indicate a >> good deal of statistical confusion. >> >> Cheers, >> Bert > > Bert, > > as this is R-help and not cross-validated I am looking for a precanned > function that would test whether the order of characters in two character > vectors comes from the same (noisy) process. I would thus expect you to say > something on the lines of: > > function X uses method Y to do something like that > function W uses method Z to do something like that > … > > look into those, figure out exactly what you are testing and use the most > appropiate function. > > The whys and wherefores are for me to deal with, I just want to know whether > someone has built a function that does, or seems to do, what I asked for. As > I said, this is R-help, and I seek help for R use. > findFn("levenshtein") found 57 matches; retrieving 3 pages 2 3 Downloaded 44 links in 17 packages. stringdist::stringdist( paste0(x, collapse=""), paste0(letters[y], collapse="") ) [1] 30 -- HTH; David. > > I do concede that my original question might have left many wondering, but I > guess my reply to Boris would have cleared any doubts. I am therefore > puzzled by the great deal of confusion on your part in understanding the > purpose of my question and, in general, of this list. > > Best wishes > > F > > >> >> >> >> On Thursday, August 6, 2015, Boris Steipe wrote: >> You are looking for what is known as the "Cayley distance" between vectors - >> an edit distance that allows only transpositions. RSeek mentions PerMallows >> (https://cran.r-project.org/web/packages/PerMallows/PerMallows.pdf) and >> Rankluster >> (https://cran.r-project.org/web/packages/Rankcluster/Rankcluster.pdf) as >> packages that support work with Cayley distances. It seems to me that >> distCayley() in Rankcluster does what you want. From the examples: >> >> x=1:5 >> y=c(2,3,1,4,5) >> distCayley(x,y) >> 8 >> >> >> Cheers, >> Boris >> >> >> >> >> >> On Aug 6, 2015, at 9:51 AM, Federico Calboli >> wrote: >> On 6 Aug 2015, at 15:40, Bert Gunter wrote: Define "goodness of match" . For exact matches, see ?"==" , all.equal, etc. >>> >>> Fair point. I would define it as a number that tells me how likely it is >>> that the same (noisy) process produced both lists. >>> >>> BW >>> >>> F >>> >>> >>> >>> Bert On Thursday, August 6, 2015, Federico Calboli wrote: Hi All, let’s assume I have a vector of letters drawn only once from the alphabet: x = sample(letters, 15, replace = F) x [1] "z" "t" "g" "l" "u" "d" "w" "x" "a" "q" "k" "j" "f" "n" “v" y = x[c(1:7,9:8, 10:12, 14, 15, 13)] I would now like to test how good a match y is for x. Obviously I can transform the letters in numbers and use a rank test, but I was left wondering whether this is the only solution and whether there are more appropriate solutions that are already implemented in R (I am not going to reinvent the wheel if I can avoid it). BW F -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll >>> >>> >>> -- >>> Federico Calboli >>> Ecological Genetics Research Unit >>> Department of Biosciences >>> PO Box 65 (Biocenter 3, Viikinkaari 1) >>> FIN-00014 University of Helsinki >>> Finland >>> >>> federico.calb...@helsinki.fi >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/post
Re: [R] testing whether two character vectors contain (the same) items in the same order
> On 7 Aug 2015, at 01:59, Bert Gunter wrote: > > Boris: > > You may be right, but it seems like esp to me based on the op's > non-description of likelihood of coming from the same noisy process. My > response would be: seek local statistical help, as your replies indicate a > good deal of statistical confusion. > > Cheers, > Bert Bert, as this is R-help and not cross-validated I am looking for a precanned function that would test whether the order of characters in two character vectors comes from the same (noisy) process. I would thus expect you to say something on the lines of: function X uses method Y to do something like that function W uses method Z to do something like that … look into those, figure out exactly what you are testing and use the most appropiate function. The whys and wherefores are for me to deal with, I just want to know whether someone has built a function that does, or seems to do, what I asked for. As I said, this is R-help, and I seek help for R use. I do concede that my original question might have left many wondering, but I guess my reply to Boris would have cleared any doubts. I am therefore puzzled by the great deal of confusion on your part in understanding the purpose of my question and, in general, of this list. Best wishes F > > > > On Thursday, August 6, 2015, Boris Steipe wrote: > You are looking for what is known as the "Cayley distance" between vectors - > an edit distance that allows only transpositions. RSeek mentions PerMallows > (https://cran.r-project.org/web/packages/PerMallows/PerMallows.pdf) and > Rankluster > (https://cran.r-project.org/web/packages/Rankcluster/Rankcluster.pdf) as > packages that support work with Cayley distances. It seems to me that > distCayley() in Rankcluster does what you want. From the examples: > > x=1:5 > y=c(2,3,1,4,5) > distCayley(x,y) > 8 > > > Cheers, > Boris > > > > > > On Aug 6, 2015, at 9:51 AM, Federico Calboli > wrote: > > >> > >> On 6 Aug 2015, at 15:40, Bert Gunter wrote: > >> > >> Define "goodness of match" . For exact matches, see ?"==" , all.equal, > >> etc. > > > > Fair point. I would define it as a number that tells me how likely it is > > that the same (noisy) process produced both lists. > > > > BW > > > > F > > > > > > > > > >> > >> Bert > >> > >> On Thursday, August 6, 2015, Federico Calboli > >> wrote: > >> Hi All, > >> > >> let’s assume I have a vector of letters drawn only once from the alphabet: > >> > >> x = sample(letters, 15, replace = F) > >> x > >> [1] "z" "t" "g" "l" "u" "d" "w" "x" "a" "q" "k" "j" "f" "n" “v" > >> > >> y = x[c(1:7,9:8, 10:12, 14, 15, 13)] > >> > >> I would now like to test how good a match y is for x. Obviously I can > >> transform the letters in numbers and use a rank test, but I was left > >> wondering whether this is the only solution and whether there are more > >> appropriate solutions that are already implemented in R (I am not going to > >> reinvent the wheel if I can avoid it). > >> > >> BW > >> > >> F > >> > >> > >> -- > >> Federico Calboli > >> Ecological Genetics Research Unit > >> Department of Biosciences > >> PO Box 65 (Biocenter 3, Viikinkaari 1) > >> FIN-00014 University of Helsinki > >> Finland > >> > >> federico.calb...@helsinki.fi > >> > >> __ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > >> > >> -- > >> Bert Gunter > >> > >> "Data is not information. Information is not knowledge. And knowledge is > >> certainly not wisdom." > >> -- Clifford Stoll > >> > > > > > > -- > > Federico Calboli > > Ecological Genetics Research Unit > > Department of Biosciences > > PO Box 65 (Biocenter 3, Viikinkaari 1) > > FIN-00014 University of Helsinki > > Finland > > > > federico.calb...@helsinki.fi > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > -- > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge is > certainly not wisdom." >-- Clifford Stoll > -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland
Re: [R] testing whether two character vectors contain (the same) items in the same order
Boris: You may be right, but it seems like esp to me based on the op's non-description of likelihood of coming from the same noisy process. My response would be: seek local statistical help, as your replies indicate a good deal of statistical confusion. Cheers, Bert On Thursday, August 6, 2015, Boris Steipe wrote: > You are looking for what is known as the "Cayley distance" between vectors > - an edit distance that allows only transpositions. RSeek mentions > PerMallows ( > https://cran.r-project.org/web/packages/PerMallows/PerMallows.pdf) and > Rankluster ( > https://cran.r-project.org/web/packages/Rankcluster/Rankcluster.pdf) as > packages that support work with Cayley distances. It seems to me that > distCayley() in Rankcluster does what you want. From the examples: > > x=1:5 > y=c(2,3,1,4,5) > distCayley(x,y) > 8 > > > Cheers, > Boris > > > > > > On Aug 6, 2015, at 9:51 AM, Federico Calboli > wrote: > > >> > >> On 6 Aug 2015, at 15:40, Bert Gunter > wrote: > >> > >> Define "goodness of match" . For exact matches, see ?"==" , all.equal, > etc. > > > > Fair point. I would define it as a number that tells me how likely it > is that the same (noisy) process produced both lists. > > > > BW > > > > F > > > > > > > > > >> > >> Bert > >> > >> On Thursday, August 6, 2015, Federico Calboli < > federico.calb...@helsinki.fi > wrote: > >> Hi All, > >> > >> let’s assume I have a vector of letters drawn only once from the > alphabet: > >> > >> x = sample(letters, 15, replace = F) > >> x > >> [1] "z" "t" "g" "l" "u" "d" "w" "x" "a" "q" "k" "j" "f" "n" “v" > >> > >> y = x[c(1:7,9:8, 10:12, 14, 15, 13)] > >> > >> I would now like to test how good a match y is for x. Obviously I can > transform the letters in numbers and use a rank test, but I was left > wondering whether this is the only solution and whether there are more > appropriate solutions that are already implemented in R (I am not going to > reinvent the wheel if I can avoid it). > >> > >> BW > >> > >> F > >> > >> > >> -- > >> Federico Calboli > >> Ecological Genetics Research Unit > >> Department of Biosciences > >> PO Box 65 (Biocenter 3, Viikinkaari 1) > >> FIN-00014 University of Helsinki > >> Finland > >> > >> federico.calb...@helsinki.fi > >> > >> __ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and > more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > >> > >> -- > >> Bert Gunter > >> > >> "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > >> -- Clifford Stoll > >> > > > > > > -- > > Federico Calboli > > Ecological Genetics Research Unit > > Department of Biosciences > > PO Box 65 (Biocenter 3, Viikinkaari 1) > > FIN-00014 University of Helsinki > > Finland > > > > federico.calb...@helsinki.fi > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and > more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and > more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing whether two character vectors contain (the same) items in the same order
You are looking for what is known as the "Cayley distance" between vectors - an edit distance that allows only transpositions. RSeek mentions PerMallows (https://cran.r-project.org/web/packages/PerMallows/PerMallows.pdf) and Rankluster (https://cran.r-project.org/web/packages/Rankcluster/Rankcluster.pdf) as packages that support work with Cayley distances. It seems to me that distCayley() in Rankcluster does what you want. From the examples: x=1:5 y=c(2,3,1,4,5) distCayley(x,y) 8 Cheers, Boris On Aug 6, 2015, at 9:51 AM, Federico Calboli wrote: >> >> On 6 Aug 2015, at 15:40, Bert Gunter wrote: >> >> Define "goodness of match" . For exact matches, see ?"==" , all.equal, etc. > > Fair point. I would define it as a number that tells me how likely it is > that the same (noisy) process produced both lists. > > BW > > F > > > > >> >> Bert >> >> On Thursday, August 6, 2015, Federico Calboli >> wrote: >> Hi All, >> >> let’s assume I have a vector of letters drawn only once from the alphabet: >> >> x = sample(letters, 15, replace = F) >> x >> [1] "z" "t" "g" "l" "u" "d" "w" "x" "a" "q" "k" "j" "f" "n" “v" >> >> y = x[c(1:7,9:8, 10:12, 14, 15, 13)] >> >> I would now like to test how good a match y is for x. Obviously I can >> transform the letters in numbers and use a rank test, but I was left >> wondering whether this is the only solution and whether there are more >> appropriate solutions that are already implemented in R (I am not going to >> reinvent the wheel if I can avoid it). >> >> BW >> >> F >> >> >> -- >> Federico Calboli >> Ecological Genetics Research Unit >> Department of Biosciences >> PO Box 65 (Biocenter 3, Viikinkaari 1) >> FIN-00014 University of Helsinki >> Finland >> >> federico.calb...@helsinki.fi >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> -- >> Bert Gunter >> >> "Data is not information. Information is not knowledge. And knowledge is >> certainly not wisdom." >> -- Clifford Stoll >> > > > -- > Federico Calboli > Ecological Genetics Research Unit > Department of Biosciences > PO Box 65 (Biocenter 3, Viikinkaari 1) > FIN-00014 University of Helsinki > Finland > > federico.calb...@helsinki.fi > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing whether two character vectors contain (the same) items in the same order
> On 6 Aug 2015, at 15:40, Bert Gunter wrote: > > Define "goodness of match" . For exact matches, see ?"==" , all.equal, etc. Fair point. I would define it as a number that tells me how likely it is that the same (noisy) process produced both lists. BW F > > Bert > > On Thursday, August 6, 2015, Federico Calboli > wrote: > Hi All, > > let’s assume I have a vector of letters drawn only once from the alphabet: > > x = sample(letters, 15, replace = F) > x > [1] "z" "t" "g" "l" "u" "d" "w" "x" "a" "q" "k" "j" "f" "n" “v" > > y = x[c(1:7,9:8, 10:12, 14, 15, 13)] > > I would now like to test how good a match y is for x. Obviously I can > transform the letters in numbers and use a rank test, but I was left > wondering whether this is the only solution and whether there are more > appropriate solutions that are already implemented in R (I am not going to > reinvent the wheel if I can avoid it). > > BW > > F > > > -- > Federico Calboli > Ecological Genetics Research Unit > Department of Biosciences > PO Box 65 (Biocenter 3, Viikinkaari 1) > FIN-00014 University of Helsinki > Finland > > federico.calb...@helsinki.fi > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > -- > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge is > certainly not wisdom." >-- Clifford Stoll > -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing whether two character vectors contain (the same) items in the same order
Define "goodness of match" . For exact matches, see ?"==" , all.equal, etc. Bert On Thursday, August 6, 2015, Federico Calboli wrote: > Hi All, > > let’s assume I have a vector of letters drawn only once from the alphabet: > > x = sample(letters, 15, replace = F) > x > [1] "z" "t" "g" "l" "u" "d" "w" "x" "a" "q" "k" "j" "f" "n" “v" > > y = x[c(1:7,9:8, 10:12, 14, 15, 13)] > > I would now like to test how good a match y is for x. Obviously I can > transform the letters in numbers and use a rank test, but I was left > wondering whether this is the only solution and whether there are more > appropriate solutions that are already implemented in R (I am not going to > reinvent the wheel if I can avoid it). > > BW > > F > > > -- > Federico Calboli > Ecological Genetics Research Unit > Department of Biosciences > PO Box 65 (Biocenter 3, Viikinkaari 1) > FIN-00014 University of Helsinki > Finland > > federico.calb...@helsinki.fi > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and > more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.